Closed abditag2 closed 3 years ago
Merging #555 (8f00aaf) into master (7377bb7) will increase coverage by
0.03%
. The diff coverage is89.28%
.
@@ Coverage Diff @@
## master #555 +/- ##
==========================================
+ Coverage 85.32% 85.35% +0.03%
==========================================
Files 85 85
Lines 4933 4978 +45
Branches 783 790 +7
==========================================
+ Hits 4209 4249 +40
- Misses 584 589 +5
Partials 140 140
Impacted Files | Coverage Δ | |
---|---|---|
petastorm/reader_impl/pytorch_shuffling_buffer.py | 92.80% <61.53%> (-3.63%) |
:arrow_down: |
petastorm/reader.py | 89.47% <87.50%> (+0.15%) |
:arrow_up: |
petastorm/pytorch.py | 92.22% <100.00%> (+1.49%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 7377bb7...8f00aaf. Read the comment docs.
Can you explain what exactly is being cached? From a quick glance I couldn't figure out the difference between the two dataloaders.
@fps7806 Caching works by preserving the values loaded into the ShufflingBuffer into the memory or GPU and not removing them. This way, if in-memory-cache is enabled, the values will only be read once from disk/network into memory. Each worker only caches its own shard of the data and not the entire data set.
This PR adds a in-memory cache to the Petastorm loader. This enables very fast reading of data and removing the IO/netowrk bottleneck.