Closed WeichenXu123 closed 4 years ago
Note:
based on this PR, for array field, the reader will generate tensorflow dataset field with shape (?, ?)
,
if we run keras model without set shape on the field, we may still get error like:
raise ValueError('The last dimension of the inputs to `Dense` '
ValueError: The last dimension of the inputs to `Dense` should be defined. Found `None`.
In order to make it work on keras model.fit, we need manaully set tensorflow dataset field shape, such as
def set_shape(x):
x.features.set_shape((None, 784))
return x
tf_dataset.map(set_shape)
or
tf_dataset.map(lambda x: (tf.reshape(x.features, shape=...), x.label))
Merging #517 into master will increase coverage by
0.00%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #517 +/- ##
=======================================
Coverage 86.18% 86.18%
=======================================
Files 81 81
Lines 4465 4467 +2
Branches 717 717
=======================================
+ Hits 3848 3850 +2
Misses 505 505
Partials 112 112
Impacted Files | Coverage Δ | |
---|---|---|
petastorm/unischema.py | 94.76% <100.00%> (+0.05%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 4a99b9b...1adf556. Read the comment docs.
@selitvin The test in PR description has been added into unit test.
@selitvin spark 2.x has compatibility issue with pyarrow>=0.15, so I skip test on pyarrow>=0.15. But don't worry, we will soon upgrade it to spark 3.0 here and the vector support also require spark 3.0 https://github.com/uber/petastorm/pull/521
I set the inferred shape for array type field to be
(None,)
instead of()
. This will address issues on tensorflow dataset.Test code:
Before Raise error like:
After Code works well.