uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k stars 285 forks source link

Address several autograph failed issues for TF2 #542

Closed WeichenXu123 closed 4 years ago

WeichenXu123 commented 4 years ago

Currently, for some function, TF2 autograph will fail. See https://github.com/tensorflow/tensorflow/issues/35765 https://github.com/tensorflow/tensorflow/issues/30149 https://github.com/tensorflow/autograph/issues/3

If autograph failed, the functions will be run eagerly and TF cannot optimize them. So we'd better address them.

Manually test

df1 = spark.range(100)
from petastorm.spark import make_spark_converter

# Set a cache directory on DBFS FUSE for intermediate data.
spark.conf.set("petastorm.spark.converter.parentCacheDirUrl", "file:///dbfs/ml/tmp/petastorm/QA/bugs/")
converter1 = make_spark_converter(df1)

with converter1.make_tf_dataset(num_epochs=1) as dataset:
  for batch in dataset:
    print(batch.id)

This error may be avoided by creating the lambda in a standalone statement.



* After
The warnings listed above disappear.
selitvin commented 4 years ago

How did you find these failures? Just by running in your external environment? If so, is it hard to add a test to make sure we don't break autograph going forward?

codecov[bot] commented 4 years ago

Codecov Report

Merging #542 into master will increase coverage by 0.00%. The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #542   +/-   ##
=======================================
  Coverage   86.15%   86.16%           
=======================================
  Files          87       87           
  Lines        4932     4935    +3     
  Branches      787      786    -1     
=======================================
+ Hits         4249     4252    +3     
  Misses        556      556           
  Partials      127      127           
Impacted Files Coverage Δ
petastorm/tf_utils.py 88.65% <100.00%> (+0.24%) :arrow_up:
petastorm/unischema.py 95.79% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f5d6ea1...5b53213. Read the comment docs.