Open Nitinsiwach opened 2 years ago
@Nitinsiwach really sorry, from the given information I really don't know what the issue could be. The "root" error message seems to be "py4j.protocol.Py4JNetworkError: Answer from Java side is empty". I'm really not sure how it's related to lightgbm classification model. I do see that you said it started failing on more data. Perhaps it ran into OOM, and only increasing number of nodes or machine RAM might help.
Describe the bug
LightGBMClassificationModel.fit
Cannot handle too much data. Fails without even having to collect anything at the driver. ILightGBMClassificationModel.fit
on data(10000,241) - It executes perfectly ILightGBMClassificationModel.fit
on data(100000,241) - It executes perfectly ILightGBMClassificationModel.fit
on data(1000000,241) - The error shows upI do not ever get
WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
.To Reproduce I can upload an entire example if necessary but since this error shows up only when I read all the rows in my data (I keep all the other configuration same) I hope it would not be necessary. Please do let me know in case it is needed
Expected behavior The execution can be slow by switching to disk caching but it should not fail IMO.
Info (please complete the following information): Python 3.9.7 pyspark 3.2.0 (pip install pyspark) sc session:
Stacktrace
Additional context
AB#1984487