uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k stars 285 forks source link

Add ngrams support to make_petastorm_dataset function. #533

Closed selitvin closed 4 years ago

selitvin commented 4 years ago

Refactored some common code to reuse tf_tensors code that deals with ngrams.

codecov[bot] commented 4 years ago

Codecov Report

Merging #533 into master will increase coverage by 0.22%. The diff coverage is 73.68%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #533      +/-   ##
==========================================
+ Coverage   85.92%   86.14%   +0.22%     
==========================================
  Files          87       87              
  Lines        4922     4928       +6     
  Branches      780      786       +6     
==========================================
+ Hits         4229     4245      +16     
+ Misses        569      556      -13     
- Partials      124      127       +3     
Impacted Files Coverage Δ
petastorm/tf_utils.py 88.40% <73.68%> (+8.10%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update dd87ee6...c321157. Read the comment docs.