ratal / mdfreader

Read Measurement Data Format (MDF) versions 3.x and 4.x file formats in python
Other
169 stars 74 forks source link

enhancement: very slow performance for getChannelData when noDataLoad=True #92

Closed danielhrisca closed 6 years ago

danielhrisca commented 7 years ago

Using mdfreader 0.2.6, if the file has many channels with the same name, the performance for the getChannelData call is very slow (first column is the execution time in "ms" and second one is the RAM usage)

Get channel <C1> data                                      0      691
Get channel <C2> data                                      0      691
Get channel <C3> data                                      0      691
Get channel <C4> data                                      0      691
Get channel <C5> data                                      0      691
Get channel <C6> data                                      0      691
Get channel <C7> data                                      0      691
Get channel <C8> data                                      0      691
Get channel <C9> data                                      0      691
Get channel <C10> data                                     0      691
Get channel <C11> data                                     0      691
Get channel <C12> data                                     0      691
Get channel <C13> data                                     0      691
Get channel <C14> data                                     0      691
Get channel <C15> data                                     0      691
Get channel <C16> data                                     0      691
Get channel <C17> data                                     0      691
Get channel <C18> data                                     0      691
Get channel <C19> data                                     0      691
Get channel <C20> data                                     0      691
Get channel <C21> data                                     0      691
Get channel <C22> data                                     0      691
Get channel <C23> data                                     0      691
Get channel <C24> data                                     0      691
Get channel <C25> data                                     0      691
Get channel <C26> data                                     0      691
Get channel <C27> data                                     0      691
Get channel <C28> data                                     0      691
Get channel <C29> data                                     0      691
Get channel <C30> data                                     0      691
Get channel <C31> data                                     0      691
Get channel <C32> data                                     0      691
Get channel <C33> data                                     0      691
Get channel <C34> data                                     0      691
Get channel <C35> data                                     0      691
Get channel <C36> data                                     0      691
Get channel <C37> data                                     0      691
Get channel <C38> data                                     0      691
Get channel <C39> data                                     0      691
Get channel <C40> data                                     0      691
Get channel <C41> data                                     0      691
Get channel <C42> data                                     0      691
Get channel <C43> data                                     0      691
Get channel <C44> data                                     0      691
Get channel <C45> data                                     0      691
Get channel <C46> data                                     0      691
Get channel <C47> data                                     0      691
Get channel <C48> data                                     0      691
Get channel <C49> data                                     0      691
Get channel <C50> data                                     0      691
Get channel <C51> data                                     0      691
Get channel <C52> data                                     0      691
Get channel <C53> data                                     0      691
Get channel <C54> data                                     0      691
Get channel <C55> data                                     0      691
Get channel <C56> data                                     0      691
Get channel <C57> data                                     0      691
Get channel <C58> data                                     0      691
Get channel <C59> data                                     0      691
Get channel <C60> data                                     0      691
Get channel <C61> data                                     0      691
Get channel <C62> data                                     0      691
Get channel <C63> data                                     0      691
Get channel <C64> data                                     0      691
Get channel <C65> data                                     0      691
Get channel <C66> data                                     0      691
Get channel <C67> data                                     0      691
Get channel <C68> data                                     0      691
Get channel <C69> data                                     0      691
Get channel <C70> data                                     0      691
Get channel <C71> data                                     0      691
Get channel <C72> data                                     0      691
Get channel <C73> data                                     0      691
Get channel <C74> data                                     0      691
Get channel <C75> data                                     0      691
Get channel <C76> data                                     0      691
Get channel <C77> data                                     0      691
Get channel <C78> data                                     0      691
Get channel <C79> data                                     0      691
Get channel <C80> data                                     0      691
Get channel <C81> data                                     0      691
Get channel <C82> data                                     0      691
Get channel <C83> data                                     0      691
Get channel <C84> data                                     0      691
Get channel <C85> data                                     0      691
Get channel <C86> data                                     0      691
Get channel <C87> data                                     0      691
Get channel <C88> data                                     0      691
Get channel <C89> data                                     0      691
Get channel <C90> data                                     0      691
Get channel <C91> data                                     0      691
Get channel <C92> data                                     0      691
Get channel <C93> data                                     0      691
Get channel <C94> data                                     0      691
Get channel <C95> data                                     0      691
Get channel <C96> data                                     0      691
Get channel <C97> data                                     0      691
Get channel <C98> data                                     0      691
Get channel <C99> data                                     0      691
Get channel <C100> data                                    0      691
Get channel <C101> data                                    0      691
Get channel <C102> data                                    0      691
Get channel <C103> data                                    0      691
Get channel <C104> data                                    0      691
Get channel <C105> data                                    0      691
Get channel <C106> data                                    0      691
Get channel <C107> data                                    0      691
Get channel <C108> data                                    0      691
Get channel <C109> data                                    0      691
Get channel <C110> data                                    0      691
Get channel <C111> data                                    0      691
Get channel <C0_1> data                                  510      691
Get channel <C1_1> data                                  510      691
Get channel <C2_1> data                                  507      691
Get channel <C3_1> data                                  502      691
Get channel <C4_1> data                                  481      691
Get channel <C5_1> data                                  485      691
Get channel <C6_1> data                                  491      691
Get channel <C7_1> data                                  475      691
Get channel <C8_1> data                                  495      691
Get channel <C9_1> data                                  485      691
Get channel <C10_1> data                                 507      691
Get channel <C11_1> data                                 495      691
Get channel <C12_1> data                                 508      691
Get channel <C13_1> data                                 494      691
Get channel <C14_1> data                                 521      691
Get channel <C15_1> data                                 485      691
Get channel <C16_1> data                                 494      691
Get channel <C17_1> data                                 558      691
Get channel <C18_1> data                                 509      691
Get channel <C19_1> data                                 497      691
Get channel <C20_1> data                                 491      691
Get channel <C21_1> data                                 493      691
Get channel <C22_1> data                                 503      691
Get channel <C23_1> data                                 494      691
Get channel <C24_1> data                                 522      691
Get channel <C25_1> data                                 515      691
Get channel <C26_1> data                                 523      691
Get channel <C27_1> data                                 505      691
Get channel <C28_1> data                                 561      691
Get channel <C29_1> data                                 518      691
Get channel <C30_1> data                                 533      691
Get channel <C31_1> data                                 515      691
Get channel <C32_1> data                                 526      691
Get channel <C33_1> data                                 556      691
Get channel <C34_1> data                                 617      691
Get channel <C35_1> data                                 537      691
Get channel <C36_1> data                                 514      691
Get channel <C37_1> data                                 521      691
Get channel <C38_1> data                                 526      691
Get channel <C39_1> data                                 533      691
Get channel <C40_1> data                                 529      691
Get channel <C41_1> data                                 562      691
Get channel <C42_1> data                                 529      691
Get channel <C43_1> data                                 530      691
Get channel <C44_1> data                                 539      691
Get channel <C45_1> data                                 531      691
Get channel <C46_1> data                                 509      691
Get channel <C47_1> data                                 506      691
Get channel <C48_1> data                                 510      691
Get channel <C49_1> data                                 503      691
Get channel <C50_1> data                                 511      691
Get channel <C51_1> data                                 515      691
Get channel <C52_1> data                                 530      691
Get channel <C53_1> data                                 520      691
Get channel <C54_1> data                                 572      691
Get channel <C55_1> data                                 525      691
Get channel <C56_1> data                                 521      691
Get channel <C57_1> data                                 517      691
Get channel <C58_1> data                                 531      691
Get channel <C59_1> data                                 534      691
Get channel <C60_1> data                                 537      691
Get channel <C61_1> data                                 526      691
Get channel <C62_1> data                                 520      691
Get channel <C63_1> data                                 517      691
Get channel <C64_1> data                                 509      691
Get channel <C65_1> data                                 513      691
Get channel <C66_1> data                                 515      691
Get channel <C67_1> data                                 552      691
Get channel <C68_1> data                                 604      691
Get channel <C69_1> data                                 553      691
Get channel <C70_1> data                                 541      691
Get channel <C71_1> data                                 549      691
Get channel <C72_1> data                                 515      691
Get channel <C73_1> data                                 513      691
Get channel <C74_1> data                                 514      691
Get channel <C75_1> data                                 567      691
Get channel <C76_1> data                                 565      691
Get channel <C77_1> data                                 572      691
Get channel <C78_1> data                                 539      691
Get channel <C79_1> data                                 514      691
Get channel <C80_1> data                                 537      691
Get channel <C81_1> data                                 538      691
Get channel <C82_1> data                                 513      691
Get channel <C83_1> data                                 483      691
Get channel <C84_1> data                                 487      691
Get channel <C85_1> data                                 496      691
Get channel <C86_1> data                                 507      691
Get channel <C87_1> data                                 479      691
Get channel <C88_1> data                                 500      691
Get channel <C89_1> data                                 513      691
Get channel <C90_1> data                                 505      691
Get channel <C91_1> data                                 482      691
Get channel <C92_1> data                                 525      691
Get channel <C93_1> data                                 488      691
Get channel <C94_1> data                                 482      691
Get channel <C95_1> data                                 509      691
Get channel <C96_1> data                                 486      691
Get channel <C97_1> data                                 502      691
Get channel <C98_1> data                                 501      691
Get channel <C99_1> data                                 484      691
Get channel <C100_1> data                                484      691
Get channel <C101_1> data                                531      691
Get channel <C102_1> data                                494      691
Get channel <C103_1> data                                523      691
Get channel <C104_1> data                                522      691
Get channel <C105_1> data                                507      691
Get channel <C106_1> data                                578      691
Get channel <C107_1> data                                516      691
Get channel <C108_1> data                                555      691
Get channel <C109_1> data                                598      691
Get channel <C110_1> data                                529      691
Get channel <C111_1> data                                520      691
Get channel <C112> data                                    0      691
Get channel <C113> data                                    0      691
Get channel <C114> data                                    0      691
Get channel <C115> data                                    0      691
Get channel <C116> data                                    0      691
Get channel <C117> data                                    0      691
Get channel <C118> data                                    0      691
Get channel <C0_2> data                                  525      691
Get channel <C1_2> data                                  518      691
Get channel <C2_2> data                                  554      691
Get channel <C3_2> data                                  552      691
Get channel <C4_2> data                                  560      691
Get channel <C5_2> data                                  552      691
Get channel <C6_2> data                                  524      691
Get channel <C7_2> data                                  513      691
Get channel <C8_2> data                                  511      691
Get channel <C9_2> data                                  554      691
Get channel <C10_2> data                                 510      691
Get channel <C11_2> data                                 508      691
Get channel <C12_2> data                                 515      691
Get channel <C13_2> data                                 514      691
Get channel <C14_2> data                                 558      691
Get channel <C15_2> data                                 526      691
Get channel <C16_2> data                                 544      691
Get channel <C17_2> data                                 523      691
Get channel <C18_2> data                                 498      691
Get channel <C19_2> data                                 506      691
Get channel <C20_2> data                                 499      691
Get channel <C21_2> data                                 488      691
Get channel <C22_2> data                                 522      691
Get channel <C23_2> data                                 502      691
Get channel <C24_2> data                                 542      691
Get channel <C25_2> data                                 489      691
Get channel <C26_2> data                                 515      691
Get channel <C27_2> data                                 517      691
Get channel <C28_2> data                                 540      691
Get channel <C29_2> data                                 492      691
Get channel <C30_2> data                                 490      691
Get channel <C31_2> data                                 547      691
Get channel <C32_2> data                                 489      691
Get channel <C33_2> data                                 485      691
Get channel <C34_2> data                                 544      691
Get channel <C35_2> data                                 497      691
Get channel <C36_2> data                                 560      691
Get channel <C37_2> data                                 525      691
Get channel <C38_2> data                                 566      691
Get channel <C39_2> data                                 518      691
Get channel <C40_2> data                                 496      691
Get channel <C41_2> data                                 507      691
Get channel <C42_2> data                                 548      691
Get channel <C43_2> data                                 502      691
Get channel <C44_2> data                                 539      691
Get channel <C45_2> data                                 520      691
Get channel <C46_2> data                                 525      691
Get channel <C47_2> data                                 521      691
Get channel <C48_2> data                                 521      691
Get channel <C49_2> data                                 553      691
Get channel <C50_2> data                                 525      691
Get channel <C51_2> data                                 524      691
Get channel <C52_2> data                                 528      691
Get channel <C53_2> data                                 517      691
Get channel <C54_2> data                                 503      691
Get channel <C55_2> data                                 513      691
Get channel <C56_2> data                                 496      691
Get channel <C57_2> data                                 527      691
Get channel <C58_2> data                                 537      691
Get channel <C59_2> data                                 499      691
Get channel <C60_2> data                                 488      691
Get channel <C61_2> data                                 502      691
Get channel <C62_2> data                                 534      691
Get channel <C63_2> data                                 490      691
Get channel <C64_2> data                                 483      691
Get channel <C65_2> data                                 489      691
Get channel <C66_2> data                                 517      691
Get channel <C67_2> data                                 511      691
Get channel <C68_2> data                                 507      691

I've changed the test files, and you can find the new ones here: https://github.com/danielhrisca/asammdf/blob/master/benchmarks/test%20files.7z

ratal commented 6 years ago

Hi, I know, there is some time I see this memory consumption issue. It is mostly coming from mdfinfo4 and its commentblock(). I am saving xml_tree object taking huge memory and the raw text. Removing it releases lot of memory. Still working on refining improvmenents. I'll commit it in the coming days I think.

danielhrisca commented 6 years ago

this issue is about needing 0.5s on each getChannelData call when noDataLoad=True when there are repeating channel names in the measurement.

for the given file (36000 channels) this will take ~5h.

ratal commented 6 years ago

I think I could improve situation with last commit.

danielhrisca commented 6 years ago

It works.

noDataLoading=True is basically a lazy loading. If you need all channels you will use the same RAM as noDataLoading=False

danielhrisca commented 6 years ago

I stand corrected: noDataLoading=True can use more RAM then noDataLoading=False

Save file Time [ms] RAM [MB]
mdfreader 0.2.7 mdfv3 7594 489
mdfreader 0.2.7 noDataLoading mdfv3 9016 575
mdfreader 0.2.7 mdfv4 6454 494
mdfreader 0.2.7 noDataLoading mdfv4 8124 653
Get all channels (36424 calls) Time [ms] RAM [MB]
mdfreader 0.2.7 mdfv3 113 461
mdfreader 0.2.7 nodata mdfv3 1648 411
mdfreader 0.2.7 mdfv4 91 475
mdfreader 0.2.7 nodata mdfv4 2481 502
ratal commented 6 years ago

Not sure to understand. NoDataLoading will load info class only. Unfortunately, info is still big for files with lot of channels for the moment. When accessing channels, its data are loaded. Except if you use compression, all the channels data should take same space with no data conversion (potentially more with conversion)

danielhrisca commented 6 years ago

Ok, wrong understanding of this argument from my side.

ratal commented 6 years ago

I think we discussed same time... Yes, indeed, info class is taking lot of memory. This is mainly because of a lot of metadata are stored, xml stuff, many channels. I will try to activate some switch in this case to only load what is absolutely necessary to parse the data and minimum information like description and unit. I think in asammdf, you do not load all these stuff and could explain this memory usage difference. Could still be interesting for users, could be accessed only when intentionnally looking for it ; using mdfinfo class. But not needed to have this complexity to parse data and play with it.

ratal commented 6 years ago

Hi Daniel, I noticed this regression with 2.7.3 and I improved situation with last commit if you are curious. Will be for next release.

danielhrisca commented 6 years ago

I tried the latest code but after a few minutes of waiting to get all channels' data I had to kill the process. I'll try to make per call timings tomorrow

ratal commented 6 years ago

Weird.. It takes me around 22sec for test.mf4 and 18s for test.mdf Not at same level as asammdf but closer. Thanks for trying. my code:

import time
import mdfreader
yop=mdfreader.mdf('test.mdf', noDataLoading=True)
tic=time.time()
for s in yop:
        toc = time.time()
        print(s)
        y=yop.getChannelData(s)
        print('{0} {1} {2}'.format(y, time.time()-toc, toc-tic))