miranov25 / RootInteractive

5 stars 12 forks source link

Compression tutorial #285

Closed pl0xz0rz closed 1 year ago

pl0xz0rz commented 1 year ago

This PR:

miranov25 commented 1 year ago
- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
------------------------------------------------------------------------------------------------ JSON report ------------------------------------------------------------------------------------------------
report saved to: test6.json
================================================================================= 37 passed, 7 warnings in 60.81s (0:01:00) =================================================================================

real    1m2.242s
user    12m48.704s
sys     0m44.004s
miranov25 commented 1 year ago

Depending on the layout of the data, a different compression factor can be achieved. Huge reduction factors can be achieved, partly due to the entropy of the data (e.g. Gaussian distribution), partly due to the repetitions in the case of flattened arrays.

For example, in the real use case of dEdx simulation (clusters per track), the common track properties are very well compressed, the charge properties also have a small entry factor. Normally, the factor O(10) is reached - depending on the repetition and the entropy of input data.

compressCDSPipe
Compresses 1 dNprimdx .* [('relative', 16), 'code', 'zip']
Compression factor 1502730 33602297 0.04472104987346549 1 dNprimdx
Compress 2 qVector .* [('relative', 16), 'code', 'zip']
Compress Factor 3637456 27522312 0.13216389669588804 2 qVector
Compress 3 region .* [('relative', 16), 'code', 'zip']
Compression factor 579220 27522277 0.02104549707133607 3 region
Compress 4 qMean .* [('relative', 16), 'code', 'zip']
Compress factor 1485573 33602294 0.04421046372607775 4 qMean
Compress 5 nTotVector .* [('relative', 16), 'code', 'zip']
Compress Factor 3336317 27522315 0.1212222518345568 5 nTotVector
Compress 6 nPrimMean .* [('relative', 16), 'code', 'zip']
Compress Factor 1502830 33602298 0.04472402452951283 6 nPrimMean
Compress 7 qStd .* [('relative', 16), 'code', 'zip']
Compress Factor 1474259 33602293 0.04387376182928945 7 qStd
Compress 8 nTotStd .* [('relative', 16), 'code', 'zip']
Compress factor 1488320 33602296 0.044292211460788274 8 nTotStd
Compress 9 nTotMean .* [('relative', 16), 'code', 'zip']
Compress Factor 1499257 33602297 0.04461769384396549 9 nTotMean
Compress 10 TransGEM .* [('relative', 16), 'code', 'zip']
Compress Factor 1461148 33602297 0.04348357494727221 10 TransGEM
Compress 11 nPrimStd .* [('relative', 16), 'code', 'zip']
Compress Factor 1487282 33602297 0.044261319397301914 11 nPrimStd
Compress 12 padLength .* [('relative', 16), 'code', 'zip']
Compress Factor 595667 27522314 0.021643056612172945 12 padLength
Compress 13 nPrimVector .* [('relative', 16), 'code', 'zip']
Compress Factor 2779472 27522316 0.10098975682133728 13 nPrimVector
Compress 14 lognPrimStd .* [('relative', 16), 'code', 'zip']
Compress Factor 1485884 33602300 0.0442197111507248 14 lognPrimStd
Compress 15 SatOn .* [('relative', 16), 'code', 'zip']
Compress Factor 150691 22962324 0.006562532607762176 15 SatOn
Compress 16 nSecSatur .* [('relative', 16), 'code', 'zip']
Compress Factor 1725854 33602298 0.05136118964244648 16 nSecSatur
Compress 17 logqStd .* [('relative', 16), 'code', 'zip']
Compress Factor 1470552 33602296 0.04376343806982713 17 logqStd
Compress 18 lognTotStd .* [('relative', 16), 'code', 'zip']
Compress Factor 1484058 33602299 0.04416537094679147 18 lognTotStd
Compress 19 lognSecSatur .* [('relative', 16), 'code', 'zip']
Compress factor 1474271 33602301 0.043874108502271914 19 lognSecSatur
Compress 20 region.factor() .* [('relative', 16), 'code', 'zip']
Compress factor 553452 6080146 0.09102610364948473 20 region.factor()
Compress 21 SatOn.factor() .* [('relative', 16), 'code', 'zip']
Compress factor 197180 6080146 0.03243014230250392 21 SatOn.factor()
Compress _all 31371473 609564013 0.051465428291285954 21
miranov25 commented 1 year ago

Merging and committing later modified version