Closed danielhrisca closed 6 years ago
Hi, I know, there is some time I see this memory consumption issue. It is mostly coming from mdfinfo4 and its commentblock(). I am saving xml_tree object taking huge memory and the raw text. Removing it releases lot of memory. Still working on refining improvmenents. I'll commit it in the coming days I think.
this issue is about needing 0.5s on each getChannelData call when noDataLoad=True when there are repeating channel names in the measurement.
for the given file (36000 channels) this will take ~5h.
I think I could improve situation with last commit.
It works.
noDataLoading=True is basically a lazy loading. If you need all channels you will use the same RAM as noDataLoading=False
I stand corrected: noDataLoading=True can use more RAM then noDataLoading=False
Save file | Time [ms] | RAM [MB] |
---|---|---|
mdfreader 0.2.7 mdfv3 | 7594 | 489 |
mdfreader 0.2.7 noDataLoading mdfv3 | 9016 | 575 |
mdfreader 0.2.7 mdfv4 | 6454 | 494 |
mdfreader 0.2.7 noDataLoading mdfv4 | 8124 | 653 |
Get all channels (36424 calls) | Time [ms] | RAM [MB] |
---|---|---|
mdfreader 0.2.7 mdfv3 | 113 | 461 |
mdfreader 0.2.7 nodata mdfv3 | 1648 | 411 |
mdfreader 0.2.7 mdfv4 | 91 | 475 |
mdfreader 0.2.7 nodata mdfv4 | 2481 | 502 |
Not sure to understand. NoDataLoading will load info class only. Unfortunately, info is still big for files with lot of channels for the moment. When accessing channels, its data are loaded. Except if you use compression, all the channels data should take same space with no data conversion (potentially more with conversion)
Ok, wrong understanding of this argument from my side.
I think we discussed same time... Yes, indeed, info class is taking lot of memory. This is mainly because of a lot of metadata are stored, xml stuff, many channels. I will try to activate some switch in this case to only load what is absolutely necessary to parse the data and minimum information like description and unit. I think in asammdf, you do not load all these stuff and could explain this memory usage difference. Could still be interesting for users, could be accessed only when intentionnally looking for it ; using mdfinfo class. But not needed to have this complexity to parse data and play with it.
Hi Daniel, I noticed this regression with 2.7.3 and I improved situation with last commit if you are curious. Will be for next release.
I tried the latest code but after a few minutes of waiting to get all channels' data I had to kill the process. I'll try to make per call timings tomorrow
Weird.. It takes me around 22sec for test.mf4 and 18s for test.mdf Not at same level as asammdf but closer. Thanks for trying. my code:
import time
import mdfreader
yop=mdfreader.mdf('test.mdf', noDataLoading=True)
tic=time.time()
for s in yop:
toc = time.time()
print(s)
y=yop.getChannelData(s)
print('{0} {1} {2}'.format(y, time.time()-toc, toc-tic))
Using mdfreader 0.2.6, if the file has many channels with the same name, the performance for the getChannelData call is very slow (first column is the execution time in "ms" and second one is the RAM usage)
I've changed the test files, and you can find the new ones here: https://github.com/danielhrisca/asammdf/blob/master/benchmarks/test%20files.7z