Open mollyStark opened 7 years ago
how many executors were you using? For debug purpose, you may want to use one executor (if you are not using 1 executor already). Also, please verify your data frame is correct.
@junshi15 I've used 1 executor by setting --num-executors
to 1 and still faced with this problem, and the log is the same as above, that reads to two parts.
And I checked the same dataframe by running the job on my local standalone machine. It had no problem. The only difference between my local machine and the cluster is that local spark version is 1.5.2 and cluster spark version is 2.0.0.
Also, I read the dataframe( the parquent file) in command line, the data have not been damaged.
Hi, I met this error "java.lang.UnsupportedOperationException: empty.reduceLeft", although I found #61 have asked about this error, but I don't think they are caused by the same reason.
In #61 , it is cause by the input dataframe is empty( source file not correct ), but I tried to use the same data source but less row(1 row) in dataframe, and the test process is successful. So it seems the error is none of the bussiness with the dataframe input location but correspond with the dataframe length! How wired!
The complete error message is like below:
We can see that the dataframe is read for two parts, the first part is
range: 0-7958646
, and it seems to be successfully tested, and the error part isrange: 7958646-11722988
, and there is a WARN MESSAGE showsWARN storage.BlockManager Executor task launch worker-0: Putting block rdd_12_1 failed
. So I'm wondering if this empty.reduceLeft error is associated with this warning?There is more information about this dataframe: There is 15M of data and 100 row in it.
Please help me solve this problem, I've stuck in this for weeks. Thank you!