danielhrisca commented 7 years ago

With the test file the memory usage goes to 2.8GB. I think that there is a memory leak worth investigating.

ratal commented 7 years ago

I tried with my version and memory is limited to 700Mo and 500Mo when using convertAfterRead=False argument. Again with python 3.5.3 and numpy 1.11.2 I am on numpy discussion list and there seems to be for the moment several memory issues with python 3.6 that should be fixed with numpy 1.13.1 --> what numpy version are you using ?

danielhrisca commented 7 years ago

I got the high memory usage when saving the mf4 file to disk. Memory usage for file opening was around 700MB like you said. (Using Python 3.6.1 x64, Windows 7 x64, numpy 1.13, mdfreader 0.2.5).

ratal commented 7 years ago

This could be normal. Data stored in mdf4 file could be compressed and using much less memory because it is using specific data type like unit8 that are then converted into float for instance (based on CCBlock) that will take much more memory back in a mdf4 file. This conversion is avoided with argument convertAfterRead=False during reading but the writing is not using original data type, only the converted type. However, there could be a pointer issue in the writing function that inflates the file. I will try to reproduce your issue while writing.

ratal commented 6 years ago

I tried on my dev platform (debian) and I barely consummed 0.2GB during writing.

danielhrisca commented 6 years ago

I use this benchmark for evaluation. You can double check on your machine

https://github.com/danielhrisca/asammdf/tree/master/benchmarks

danielhrisca commented 6 years ago

Hi,

using the two test files (mdf version 3 and 4) I have:

3.6.2rc1 (heads/3.6:268e1fb, Jun 17 2017, 19:01:44) [MSC v.1900 64 bit (AMD64)]
Windows-10-10.0.15063-SP0
Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
16GB installed RAM

Save file	Time [ms]	RAM [MB]
mdfreader 0.2.6 mdfv3	26894	2002
mdfreader 0.2.6 mdfv4	25403	2715

ratal commented 6 years ago

Hi Daniel, So far on Linux:

For the mdf3, file is 308.6Mo (originally 176.1Mo). There I have 10x lower time, ~2.45s ?
For the mdf4, file is 300.1Mo (originally 192.5Mo). Too long, will be working on it for next release (due to many many 30k+ small channels) Core i7 5820K I will try on win10

danielhrisca commented 6 years ago

The file is ok. I don't know what you have on your Dev PC but if you install mdfreader from pypi or GitHub the results are as I have shown (tested on Linux, Windows, python 2.7 and python 3.6). PS: you have the proper test file

ratal commented 6 years ago

Hi, Just tried on Win10 64 bit anaconda 4.3.1 (python 3.6.0) and winPython 3.6.1 (virtual machine in same linux machine)

mdf3 reading in 3.5s (anaconda), 2.8s (winPython) and file 301.3/299Mo Anaconda/WinPython
mf4 in 37s (too long again, comparatively ~25s on linux) and file is 293/293 Mo Anaconda/WinPython. There I have surprisingly only in WinPython a

RuntimeWarning: invalid value encountered in multiply return vect * P2 + P1

My command is relatively same as your benchmark (no timer): yop=mdfreader.mdf('error.mdf') yop.write() I do not get it

danielhrisca commented 6 years ago

Hi Aymeric, mdf3 reading is indeed about 3.5s mdf4 reading is slow both mdf3 and mdf4 write is slow and consume a lot of RAM in my tests

ratal commented 6 years ago

Ok, got confused by the issues, I will check the RAM consumption during writing.

ratal commented 6 years ago

It seems to be usage of pack() being a mistake. I will have to investigate an alternative like .tobytes() from numpy

ratal commented 6 years ago

Hi Daniel, Found alternative of pack using records fromarrays() and tobytes(). --> big speed up and much lower memory consumption. However, I will have to test it more in detail. Still mdf3 could be further speed up, next.

danielhrisca commented 6 years ago

Hello Aymeric,

why is there such a high RAM usage for mdf version 4 with noDataLoading=True ?

Benchmark environment

3.6.2rc1 (heads/3.6:268e1fb, Jun 17 2017, 19:01:44) [MSC v.1900 64 bit (AMD64)]
Windows-10-10.0.15063-SP0
Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
16GB installed RAM

Notations used in the results

nodata = MDF object created with load_measured_data=False (raw channel data not loaded into RAM)
compression = MDF object created with compression=blosc
compression bcolz 6 = MDF object created with compression=6
noDataLoading = MDF object read with noDataLoading=True

Files used for benchmark: * 183 groups * 36424 channels

Open file	Time [ms]	RAM [MB]
mdfreader 0.2.6 mdfv3	3698	542
mdfreader 0.2.6 compression mdfv3	5041	262
mdfreader 0.2.6 noDataLoading mdfv3	1933	193
mdfreader 0.2.6 mdfv4	42596	1315
mdfreader 0.2.6 compression mdfv4	46789	1027
mdfreader 0.2.6 noDataLoading mdfv4	5001	948

ratal commented 6 years ago

Hi Daniel, I found a lazy coding part handling text channel and its encoding. I improved a bit code and speed should be drastically reduced for mdf4 reading. However, RAM usage with noDataLoading is still too high indeed. Work in progress as you could notice.

ratal commented 6 years ago

Hi Daniel, By the way, did you try to benchmark with 'big' file ( >1Gb) and much less number of channels ? (<1000channels).

danielhrisca commented 6 years ago

Benchmark environment

3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)]
Windows-10-10.0.14393-SP0
Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
16GB installed RAM

Notations used in the results

compression = mdfreader mdf object created with compression=blosc
compression bcolz 6 = mdfreader mdf object created with compression=6
noDataLoading = mdfreader mdf object read with noDataLoading=True

Files used for benchmark:

7 groups
50 channels
378 MB file

Open file	Time [ms]	RAM [MB]
mdfreader 0.2.6 mdfv3	264	567
mdfreader 0.2.6 compression mdfv3	838	531
mdfreader 0.2.6 compression bcolz 6 mdfv3	1625	543
mdfreader 0.2.6 noDataLoading mdfv3	4	92
mdfreader 0.2.6 mdfv4	273	586
mdfreader 0.2.6 compression mdfv4	844	610
mdfreader 0.2.6 compression bcolz 6 mdfv4	1635	613
mdfreader 0.2.6 noDataLoading mdfv4	7	94

Save file	Time [ms]	RAM [MB]
mdfreader 0.2.6 mdfv3	2117	1011
mdfreader 0.2.6 compression mdfv3	2188	938
mdfreader 0.2.6 compression bcolz 6 mdfv3	2663	937
mdfreader 0.2.6 mdfv4	1967	1011
mdfreader 0.2.6 compression mdfv4	2115	939
mdfreader 0.2.6 compression bcolz 6 mdfv4	2381	937

Get all channels	Time [ms]	RAM [MB]
mdfreader 0.2.6 mdfv3	0	566
mdfreader 0.2.6 compression mdfv3	331	531
mdfreader 0.2.6 compression bcolz 6 mdfv3	524	543
mdfreader 0.2.6 mdfv4	0	586
mdfreader 0.2.6 nodata mdfv4	272	551
mdfreader 0.2.6 compression mdfv4	328	610
mdfreader 0.2.6 compression bcolz 6 mdfv4	520	613

danielhrisca commented 6 years ago

results with commit https://github.com/ratal/mdfreader/commit/36dfe4aae917eb9d232e639bf603f30dfec5d7fa

Benchmark environment

3.6.2rc1 (heads/3.6:268e1fb, Jun 17 2017, 19:01:44) [MSC v.1900 64 bit (AMD64)]
Windows-10-10.0.15063-SP0
Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
16GB installed RAM

Notations used in the results

compression = mdfreader mdf object created with compression=blosc
compression bcolz 6 = mdfreader mdf object created with compression=6
noDataLoading = mdfreader mdf object read with noDataLoading=True

Files used for benchmark:

183 groups
36424 channels

Open file	Time [ms]	RAM [MB]
mdfreader 0.2.6 mdfv3	3744	542
mdfreader 0.2.6 compression mdfv3	5163	263
mdfreader 0.2.6 compression bcolz 6 mdfv3	5288	1035
mdfreader 0.2.6 noDataLoading mdfv3	2047	193
mdfreader 0.2.6 mdfv4	7337	1315
mdfreader 0.2.6 compression mdfv4	8517	1027
mdfreader 0.2.6 compression bcolz 6 mdfv4	9082	1750
mdfreader 0.2.6 noDataLoading mdfv4	5348	948

Save file	Time [ms]	RAM [MB]
mdfreader 0.2.6 mdfv3	7273	574
mdfreader 0.2.6 noDataLoading mdfv3	9414	574
mdfreader 0.2.6 compression mdfv3	7629	536
mdfreader 0.2.6 compression bcolz 6 mdfv3	7231	1035
mdfreader 0.2.6 mdfv4	4293	1336
mdfreader 0.2.6 noDataLoading mdfv4	6205	1336
mdfreader 0.2.6 compression mdfv4	4911	1292
mdfreader 0.2.6 compression bcolz 6 mdfv4	4776	1767

Get all channels (36424 calls)	Time [ms]	RAM [MB]
mdfreader 0.2.6 mdfv3	93	542
mdfreader 0.2.6 nodata mdfv3	118503	414
mdfreader 0.2.6 compression mdfv3	718	266
mdfreader 0.2.6 compression bcolz 6 mdfv3	345	1036
mdfreader 0.2.6 mdfv4	96	1314
mdfreader 0.2.6 nodata mdfv4	172578	1185
mdfreader 0.2.6 compression mdfv4	731	1035
mdfreader 0.2.6 compression bcolz 6 mdfv4	455	1758

ratal commented 6 years ago

I reduced memory use generally to almost original file data in last commit. However, bcolz seems disappointing, maybe too much overhead for each channels, blosc is much better for this use case.

danielhrisca commented 6 years ago

Hello Aymeric,

good work, the memory usage has been improved a lot since 0.2.5.

Regarding bcolz it is indeed not suitable by default; it would probably work better with a transposition of the data block records. For myself I've already dropped all compression options since it was performing worse then not loading the raw record data.

Results

Benchmark environment

3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)]
Windows-10-10.0.14393-SP0
Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
16GB installed RAM

Notations used in the results

nodata = asammdf MDF object created with load_measured_data=False (raw channel data not loaded into RAM)
compress = mdfreader mdf object created with compression=blosc
compression bcolz 6 = mdfreader mdf object created with compression=6
noDataLoading = mdfreader mdf object read with noDataLoading=True

Files used for benchmark:

183 groups
36424 channels

Open file	Time [ms]	RAM [MB]
mdfreader 0.2.7 mdfv3	4319	458
mdfreader 0.2.7 compress mdfv3	5997	195
mdfreader 0.2.7 compress bcolz 6 mdfv3	6117	947
mdfreader 0.2.7 noDataLoading mdfv3	1711	187
mdfreader 0.2.7 mdfv4	5705	467
mdfreader 0.2.7 compress mdfv4	7174	183
mdfreader 0.2.7 compress bcolz 6 mdfv4	7331	907
mdfreader 0.2.7 noDataLoading mdfv4	4172	261

Save file	Time [ms]	RAM [MB]
mdfreader 0.2.7 mdfv3	8704	481
mdfreader 0.2.7 compress mdfv3	8672	451
mdfreader 0.2.7 compress bcolz 6 mdfv3	8398	949
mdfreader 0.2.7 mdfv4	6669	489
mdfreader 0.2.7 compress mdfv4	8216	446
mdfreader 0.2.7 compress bcolz6 mdfv4	6642	922

Get all channels (36424 calls)	Time [ms]	RAM [MB]
mdfreader 0.2.7 mdfv3	68	458
mdfreader 0.2.7 compress mdfv3	645	196
mdfreader 0.2.7 compress bcolz 6 mdfv3	272	949
mdfreader 0.2.7 mdfv4	67	467
mdfreader 0.2.7 compress mdfv4	670	189
mdfreader 0.2.7 compress bcolz 6 mdfv4	295	914

danielhrisca commented 6 years ago

I guess it's up to you if you want to close this issue.

ratal commented 6 years ago

ok, thanks. I will review later compression status.

ratal / mdfreader

improvement: memory usage for MDF4 files #72

Results