Closed trossi closed 6 months ago
Attention: 1 lines
in your changes are missing coverage. Please review.
Comparison is base (
d4c1d83
) 92.02% compared to head (3a4631e
) 91.98%. Report is 11 commits behind head on develop.
Files | Patch % | Lines |
---|---|---|
rdata/parser/_parser.py | 96.15% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Here is approximate timing data for reference:
Array size (MiB) | Time to read before this PR (s) | Time to read with this PR (s) |
---|---|---|
16 | 1.2 | 0.3 |
32 | 2.2 | 0.3 |
64 | 4.2 | 0.3 |
128 | 8.0 | 0.4 |
256 | 18.5 | 0.5 |
512 | 39.4 | 0.8 |
1024 | 77.3 | 1.5 |
This data was created with the following (in bash):
for i in {1..7}; do n=$(( 2 ** $i )); Rscript -e "saveRDS(runif(n=$n*1024**2), file='array_$i.rds', compress=FALSE)"; done
for i in {1..7}; do echo $i; time -p python -c "from rdata.parser import parse_file; parse_file('array_$i.rds')"; done
Sorry for the delay in accepting this. I was on vacation and had a "forced" digital detox. Approving and merging now.
No problem, thank you for merging! I hope you had a relaxing vacation!
I'll open a PR for ASCII reader next.
Here is approximate timing data for reference:
Just to let you know: I added your example (but limited to 5 iterations) as an asv benchmark to the package, to check for future performance regressions. I also added a new testing
module (currently undocumented) to retrieve and execute R snippets from strings, so that each test can have its associated R snippet for creating the data, instead of a big script for all.
This PR adds faster reader for files in xdr format. Full arrays are read directly with numpy instead of reading element by element. As a positive side effect, deprecated xdrlib isn't needed anymore.
Related to #31. I have cherry-picked and rebased commits related to xdr reader improvements to this PR. There are also some structural changes that are open for discussion, for example, xdr reader is moved to
rdata/io/xdr.py
to simplify separation between different readers like (upcoming)rdata/io/ascii.py
. @vnmabus Could you review?