Closed WeichenXu123 closed 4 years ago
Merging #503 into master will increase coverage by
0.10%
. The diff coverage is96.00%
.
@@ Coverage Diff @@
## master #503 +/- ##
==========================================
+ Coverage 85.98% 86.09% +0.10%
==========================================
Files 81 81
Lines 4311 4358 +47
Branches 674 694 +20
==========================================
+ Hits 3707 3752 +45
- Misses 499 500 +1
- Partials 105 106 +1
Impacted Files | Coverage Δ | |
---|---|---|
petastorm/reader.py | 90.99% <90.90%> (+0.17%) |
:arrow_up: |
petastorm/arrow_reader_worker.py | 92.00% <100.00%> (ø) |
|
petastorm/etl/dataset_metadata.py | 88.88% <100.00%> (ø) |
|
petastorm/fs_utils.py | 91.01% <100.00%> (+2.27%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update e3acecf...5ac9288. Read the comment docs.
@selitvin
make_reader
test failed. make_reader
need more change to support reading url list. (ParquetDataset
when accept file list, it cannot read metadata_path file...)
But only support make_batch_reader
is OK, because spark DL converter implements via make_batch_reader
.
Having support for dataset url list only in make_batch_reader
is ok with me if the make_reader
adds too much work.
@selitvin Ready.
Currently,
make_batch_reader
andmake_reader
only accept a directory as a dataset url, but sometimes we need specify parquet file list as input, the reason is:We may want to use petastorm on aws S3, but S3 only provide eventually consistency for dir list operation (it is unreliable). So that, for S3 we need special code logic to get parquet file list and pass file list to
make_batch_reader
Old version pyarrow do not allow files starts with underscore exists in the parquet directory (except the "_metadata" file). But some parquet implementation may put some commit protocol file such as _start, _succuss, make reader support dataset url list will address this issue.
Test
UT added.
End to end test code: