Open ToufiPF opened 1 year ago
Leaving this here for anyone having the same issue.
The workaround I found was to replace the NaNs with a magic number (e.g., -infinity) in the data using pandas. Then at loading time, I use Dataset.map
to map back the -infinity to NaNs.
Hello, When reading a parquet file with specifed dtypes and a column full of NaNs, tf.data.IODataset.from_parquet crashes. I'm guessing this is somehow related to how data types are deduced or how NaNs are interpreted (but note that I'm providing the TensorSpecs manually).
Reproducible example
Stacktraces
Stacktraces obtained by running with
python -X dev crash.py
. I removed the stuff about CUDA, which I doubt is related. There are different stacktraces depending on whethertf.data.experimental.enable_debug_mode
is called.Stacktrace w/
tf.data.experimental.enable_debug_mode()
W/o
tf.data.experimental.enable_debug_mode()
Environment
requirements.txt
This is similar to #1667, except there shouldn't be any type mixup in this scenario.