uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k stars 285 forks source link

Update Spark Converter API section in README.rst #518

Closed liangz1 closed 4 years ago

liangz1 commented 4 years ago

The "Spark Converter API" section now contains minimalist examples of Spark -> Tensorflow and Spark -> PyTorch. I also edited the section "PySpark and SQL" to make it better connected with the preceding section.

codecov[bot] commented 4 years ago

Codecov Report

Merging #518 into master will not change coverage by %. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #518   +/-   ##
=======================================
  Coverage   86.24%   86.24%           
=======================================
  Files          81       81           
  Lines        4471     4471           
  Branches      718      718           
=======================================
  Hits         3856     3856           
  Misses        503      503           
  Partials      112      112           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c4cc57a...3f624b6. Read the comment docs.

liangz1 commented 4 years ago

@selitvin Please feel free to review this PR, thanks!

mengxr commented 4 years ago

@selitvin We are waiting for Spark 3.0 RC1 release to add an example that automatically converters MLlib vectors into dense arrays. Will do as a follow-up task.