liangz1 commented 4 years ago

There are 2 end-to-end examples in examples/spark_dataset_converter/:

tensorflow_converter_example.py: A simple MLP model is trained locally and on a spark worker.
pytorch_converter_example.py: A CNN model is trained locally and on a spark worker.

These examples are tested in examples/spark_dataset_converter/tests/test_converter_examples.py. The dataset is mnist in libsvm format, downloaded by from examples.spark_dataset_converter.utils import download_mnist_libsvm.

There are also a few fixes:

Removed check_parent_url. We will directly read from the spark conf.
I used tensorflow==1.15.0 for the examples in order to use keras.losses.SparseCategoricalCrossentropy.
Refactored the import from tensorflow.python.framework.errors_impl import OutOfRangeError since it is not in tensorflow==1.15.0
In _default_delete_dir_handler, don't try to delete the file if the file does not exist.

codecov[bot] commented 4 years ago

Codecov Report

Merging #530 into master will increase coverage by 0.21%. The diff coverage is 90.96%.

@@            Coverage Diff             @@
##           master     #530      +/-   ##
==========================================
+ Coverage   86.32%   86.53%   +0.21%     
==========================================
  Files          81       85       +4     
  Lines        4525     4694     +169     
  Branches      731      737       +6     
==========================================
+ Hits         3906     4062     +156     
- Misses        505      515      +10     
- Partials      114      117       +3

Impacted Files	Coverage Δ
setup.py	`0.00% <ø> (ø)`
petastorm/spark/spark_dataset_converter.py	`92.69% <57.14%> (-0.03%)`	:arrow_down:
..._dataset_converter/tensorflow_converter_example.py	`85.10% <85.10%> (ø)`
...ark_dataset_converter/pytorch_converter_example.py	`94.00% <94.00%> (ø)`
..._dataset_converter/tests/test_converter_example.py	`100.00% <100.00%> (ø)`
examples/spark_dataset_converter/utils.py	`100.00% <100.00%> (ø)`
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a768179...0588339. Read the comment docs.

WeichenXu123 commented 4 years ago

@liangz1 I fixed errors. Now you need to add pylint back on example code, and fix those pylint errors.

uber / petastorm

Add spark dataset converter mnist example scripts #530

Codecov Report