uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k stars 285 forks source link

Fix issue: spark session created on spark executor #531

Closed WeichenXu123 closed 4 years ago

WeichenXu123 commented 4 years ago

Current code exists an issue that spark session will be created on spark executor. This is because _get_spark_session being invoked by _wait_file_available, and _wait_file_available will be invoked from executor side.

So I fix it. I removed the config petastorm.spark.converter.fileAvailabilityWaitTimeoutSecs, the config is rarely used.

codecov[bot] commented 4 years ago

Codecov Report

Merging #531 into master will decrease coverage by 0.00%. The diff coverage is 50.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #531      +/-   ##
==========================================
- Coverage   86.53%   86.53%   -0.01%     
==========================================
  Files          85       85              
  Lines        4694     4692       -2     
  Branches      737      737              
==========================================
- Hits         4062     4060       -2     
  Misses        515      515              
  Partials      117      117              
Impacted Files Coverage Δ
petastorm/spark/spark_dataset_converter.py 92.63% <50.00%> (-0.06%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9a903d9...1f1386e. Read the comment docs.