Closed liangz1 closed 4 years ago
Merging #537 into master will increase coverage by
0.01%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #537 +/- ##
==========================================
+ Coverage 86.52% 86.53% +0.01%
==========================================
Files 85 85
Lines 4713 4717 +4
Branches 743 743
==========================================
+ Hits 4078 4082 +4
Misses 516 516
Partials 119 119
Impacted Files | Coverage Δ | |
---|---|---|
petastorm/fs_utils.py | 91.75% <100.00%> (+0.74%) |
:arrow_up: |
petastorm/reader.py | 90.73% <100.00%> (-0.27%) |
:arrow_down: |
petastorm/spark/spark_dataset_converter.py | 92.50% <100.00%> (+0.05%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update c7b8475...6ffdb5e. Read the comment docs.
Nit: normalize_dataset_url
apply on a parent dir, this make code confusing, so rename normalize_dataset_url
to normalize_dir_url
and move it into petastorm.fs_utils
package.
Bug: If we execute the following code:
The last line will hit cache:
The median size (1333465) of these parquet files (file:/url1/file_abc.parquet) is too small.Increase file sizes by repartition or coalesce spark dataframe, which will help improve performance.
indicating that the new conf parent dir is not respected (still hitting the .../url1 dir).Fix: We respect the conf change by adding an equality test against the parent cache dir.