Open taynaud opened 8 years ago
Is this issue still present in Spark 2.0.0?
I do not know, the issue appears randomly and I have not reproduced it on my cluster. I have add spark 2.0 to CI in #71 but as it is random, I do not know if it will allow to conclude.
I think this parallelization is not very usefull for a spark computation.
Without threading a pipeline steps will be executed sequentially. I think n_jobs make sense, multiple dags will be submitted and executed in parallel. The overall level of parallelization can be increased via n_jobs.
Shouldn't we drop support for spark versions before 2.0.0?
According to apache jira, it is still an issue in pyspark 2.0.2
See https://issues.apache.org/jira/browse/SPARK-12717 The parameter is still here for the converted to_scikit() object
I think it explain the flappy test on my previous PR