Closed danielhrisca closed 6 years ago
I tried with my version and memory is limited to 700Mo and 500Mo when using convertAfterRead=False argument. Again with python 3.5.3 and numpy 1.11.2 I am on numpy discussion list and there seems to be for the moment several memory issues with python 3.6 that should be fixed with numpy 1.13.1 --> what numpy version are you using ?
I got the high memory usage when saving the mf4 file to disk. Memory usage for file opening was around 700MB like you said. (Using Python 3.6.1 x64, Windows 7 x64, numpy 1.13, mdfreader 0.2.5).
This could be normal. Data stored in mdf4 file could be compressed and using much less memory because it is using specific data type like unit8 that are then converted into float for instance (based on CCBlock) that will take much more memory back in a mdf4 file. This conversion is avoided with argument convertAfterRead=False during reading but the writing is not using original data type, only the converted type. However, there could be a pointer issue in the writing function that inflates the file. I will try to reproduce your issue while writing.
I tried on my dev platform (debian) and I barely consummed 0.2GB during writing.
I use this benchmark for evaluation. You can double check on your machine
https://github.com/danielhrisca/asammdf/tree/master/benchmarks
Hi,
using the two test files (mdf version 3 and 4) I have:
Save file | Time [ms] | RAM [MB] |
---|---|---|
mdfreader 0.2.6 mdfv3 | 26894 | 2002 |
mdfreader 0.2.6 mdfv4 | 25403 | 2715 |
Hi Daniel, So far on Linux:
The file is ok. I don't know what you have on your Dev PC but if you install mdfreader from pypi or GitHub the results are as I have shown (tested on Linux, Windows, python 2.7 and python 3.6). PS: you have the proper test file
Hi, Just tried on Win10 64 bit anaconda 4.3.1 (python 3.6.0) and winPython 3.6.1 (virtual machine in same linux machine)
RuntimeWarning: invalid value encountered in multiply return vect * P2 + P1
My command is relatively same as your benchmark (no timer):
yop=mdfreader.mdf('error.mdf')
yop.write()
I do not get it
Hi Aymeric, mdf3 reading is indeed about 3.5s mdf4 reading is slow both mdf3 and mdf4 write is slow and consume a lot of RAM in my tests
Ok, got confused by the issues, I will check the RAM consumption during writing.
It seems to be usage of pack() being a mistake. I will have to investigate an alternative like .tobytes() from numpy
Hi Daniel, Found alternative of pack using records fromarrays() and tobytes(). --> big speed up and much lower memory consumption. However, I will have to test it more in detail. Still mdf3 could be further speed up, next.
Hello Aymeric,
why is there such a high RAM usage for mdf version 4 with noDataLoading=True ?
Benchmark environment
Notations used in the results
Files used for benchmark: * 183 groups * 36424 channels
Open file | Time [ms] | RAM [MB] |
---|---|---|
mdfreader 0.2.6 mdfv3 | 3698 | 542 |
mdfreader 0.2.6 compression mdfv3 | 5041 | 262 |
mdfreader 0.2.6 noDataLoading mdfv3 | 1933 | 193 |
mdfreader 0.2.6 mdfv4 | 42596 | 1315 |
mdfreader 0.2.6 compression mdfv4 | 46789 | 1027 |
mdfreader 0.2.6 noDataLoading mdfv4 | 5001 | 948 |
Hi Daniel, I found a lazy coding part handling text channel and its encoding. I improved a bit code and speed should be drastically reduced for mdf4 reading. However, RAM usage with noDataLoading is still too high indeed. Work in progress as you could notice.
Hi Daniel, By the way, did you try to benchmark with 'big' file ( >1Gb) and much less number of channels ? (<1000channels).
Benchmark environment
Notations used in the results
Files used for benchmark:
Open file | Time [ms] | RAM [MB] |
---|---|---|
mdfreader 0.2.6 mdfv3 | 264 | 567 |
mdfreader 0.2.6 compression mdfv3 | 838 | 531 |
mdfreader 0.2.6 compression bcolz 6 mdfv3 | 1625 | 543 |
mdfreader 0.2.6 noDataLoading mdfv3 | 4 | 92 |
mdfreader 0.2.6 mdfv4 | 273 | 586 |
mdfreader 0.2.6 compression mdfv4 | 844 | 610 |
mdfreader 0.2.6 compression bcolz 6 mdfv4 | 1635 | 613 |
mdfreader 0.2.6 noDataLoading mdfv4 | 7 | 94 |
Save file | Time [ms] | RAM [MB] |
---|---|---|
mdfreader 0.2.6 mdfv3 | 2117 | 1011 |
mdfreader 0.2.6 compression mdfv3 | 2188 | 938 |
mdfreader 0.2.6 compression bcolz 6 mdfv3 | 2663 | 937 |
mdfreader 0.2.6 mdfv4 | 1967 | 1011 |
mdfreader 0.2.6 compression mdfv4 | 2115 | 939 |
mdfreader 0.2.6 compression bcolz 6 mdfv4 | 2381 | 937 |
Get all channels | Time [ms] | RAM [MB] |
---|---|---|
mdfreader 0.2.6 mdfv3 | 0 | 566 |
mdfreader 0.2.6 compression mdfv3 | 331 | 531 |
mdfreader 0.2.6 compression bcolz 6 mdfv3 | 524 | 543 |
mdfreader 0.2.6 mdfv4 | 0 | 586 |
mdfreader 0.2.6 nodata mdfv4 | 272 | 551 |
mdfreader 0.2.6 compression mdfv4 | 328 | 610 |
mdfreader 0.2.6 compression bcolz 6 mdfv4 | 520 | 613 |
results with commit https://github.com/ratal/mdfreader/commit/36dfe4aae917eb9d232e639bf603f30dfec5d7fa
Benchmark environment
Notations used in the results
Files used for benchmark:
Open file | Time [ms] | RAM [MB] |
---|---|---|
mdfreader 0.2.6 mdfv3 | 3744 | 542 |
mdfreader 0.2.6 compression mdfv3 | 5163 | 263 |
mdfreader 0.2.6 compression bcolz 6 mdfv3 | 5288 | 1035 |
mdfreader 0.2.6 noDataLoading mdfv3 | 2047 | 193 |
mdfreader 0.2.6 mdfv4 | 7337 | 1315 |
mdfreader 0.2.6 compression mdfv4 | 8517 | 1027 |
mdfreader 0.2.6 compression bcolz 6 mdfv4 | 9082 | 1750 |
mdfreader 0.2.6 noDataLoading mdfv4 | 5348 | 948 |
Save file | Time [ms] | RAM [MB] |
---|---|---|
mdfreader 0.2.6 mdfv3 | 7273 | 574 |
mdfreader 0.2.6 noDataLoading mdfv3 | 9414 | 574 |
mdfreader 0.2.6 compression mdfv3 | 7629 | 536 |
mdfreader 0.2.6 compression bcolz 6 mdfv3 | 7231 | 1035 |
mdfreader 0.2.6 mdfv4 | 4293 | 1336 |
mdfreader 0.2.6 noDataLoading mdfv4 | 6205 | 1336 |
mdfreader 0.2.6 compression mdfv4 | 4911 | 1292 |
mdfreader 0.2.6 compression bcolz 6 mdfv4 | 4776 | 1767 |
Get all channels (36424 calls) | Time [ms] | RAM [MB] |
---|---|---|
mdfreader 0.2.6 mdfv3 | 93 | 542 |
mdfreader 0.2.6 nodata mdfv3 | 118503 | 414 |
mdfreader 0.2.6 compression mdfv3 | 718 | 266 |
mdfreader 0.2.6 compression bcolz 6 mdfv3 | 345 | 1036 |
mdfreader 0.2.6 mdfv4 | 96 | 1314 |
mdfreader 0.2.6 nodata mdfv4 | 172578 | 1185 |
mdfreader 0.2.6 compression mdfv4 | 731 | 1035 |
mdfreader 0.2.6 compression bcolz 6 mdfv4 | 455 | 1758 |
I reduced memory use generally to almost original file data in last commit. However, bcolz seems disappointing, maybe too much overhead for each channels, blosc is much better for this use case.
Hello Aymeric,
good work, the memory usage has been improved a lot since 0.2.5.
Regarding bcolz it is indeed not suitable by default; it would probably work better with a transposition of the data block records. For myself I've already dropped all compression options since it was performing worse then not loading the raw record data.
Benchmark environment
Notations used in the results
Files used for benchmark:
Open file | Time [ms] | RAM [MB] |
---|---|---|
mdfreader 0.2.7 mdfv3 | 4319 | 458 |
mdfreader 0.2.7 compress mdfv3 | 5997 | 195 |
mdfreader 0.2.7 compress bcolz 6 mdfv3 | 6117 | 947 |
mdfreader 0.2.7 noDataLoading mdfv3 | 1711 | 187 |
mdfreader 0.2.7 mdfv4 | 5705 | 467 |
mdfreader 0.2.7 compress mdfv4 | 7174 | 183 |
mdfreader 0.2.7 compress bcolz 6 mdfv4 | 7331 | 907 |
mdfreader 0.2.7 noDataLoading mdfv4 | 4172 | 261 |
Save file | Time [ms] | RAM [MB] |
---|---|---|
mdfreader 0.2.7 mdfv3 | 8704 | 481 |
mdfreader 0.2.7 compress mdfv3 | 8672 | 451 |
mdfreader 0.2.7 compress bcolz 6 mdfv3 | 8398 | 949 |
mdfreader 0.2.7 mdfv4 | 6669 | 489 |
mdfreader 0.2.7 compress mdfv4 | 8216 | 446 |
mdfreader 0.2.7 compress bcolz6 mdfv4 | 6642 | 922 |
Get all channels (36424 calls) | Time [ms] | RAM [MB] |
---|---|---|
mdfreader 0.2.7 mdfv3 | 68 | 458 |
mdfreader 0.2.7 compress mdfv3 | 645 | 196 |
mdfreader 0.2.7 compress bcolz 6 mdfv3 | 272 | 949 |
mdfreader 0.2.7 mdfv4 | 67 | 467 |
mdfreader 0.2.7 compress mdfv4 | 670 | 189 |
mdfreader 0.2.7 compress bcolz 6 mdfv4 | 295 | 914 |
I guess it's up to you if you want to close this issue.
ok, thanks. I will review later compression status.
With the test file the memory usage goes to 2.8GB. I think that there is a memory leak worth investigating.