mpokorny / vysmaw

Fast visibility stream muncher
GNU General Public License v3.0
1 stars 2 forks source link

memory usage during config.start #36

Open caseyjlaw opened 6 years ago

caseyjlaw commented 6 years ago

I'm trying to track down a memory leak in my application. It is probably related to how I allocate memory in cython. But there is something I don't understand about how vysmaw is using memory. My test is to run vyssim to generate a stream and then set a time filter to select 2 GB of that stream. I can confirm that the vysmaw data buffer is about 2 GB in size and the numpy data array is about that size, so the total allocation should be 4 GB. After reading a complete segment, the process indeed uses 4 GB at peak. However, after releasing the vysmaw data buffer and deleting the returned numpy array, there remains about 800 MB of memory in use. If I rerun the vysmaw app, the remaining memory in use does not grow by another 800 MB, so perhaps this not really a leak. Still, this is a big chunk of memory that I cannot use, even after running python garbage collection, etc..

I've added sleep statements to the vysmaw app and found that prior to allocating my numpy buffer, I set and start the time filter function to select spectra like this: handle, consumers = self.config.start(1, f, u) At this point, the vysmaw data buffer has been allocated (2 GB), but there seems to be an "extra" 800 MB allocated at this point. It may be a coincidence, but that is the amount that remains in use at the end. By changing the sleep time, I can see that this initial memory use (beyond the buffer) is spectra coming through the filter. Is there any reason why they would be treated differently than the spectra?

mpokorny commented 6 years ago

Just guessing here, but vysmaw has to allocate memory to receive the signal messages as well, so maybe that's where the memory is going. The description in vysmaw.h of the signal_message_pool_overhead_factor configuration parameter states how the memory allocated for signal messages is determined; perhaps that might allow you to decide whether that's where the "extra" memory is going. Regardless, all memory used by vysmaw should be released when the vysmaw handle python instance is freed, or when your program explicitly calls the handle.shutdown() method.

caseyjlaw commented 6 years ago

I checked on the size of this buffer and the number of messages is too small to fill more than a few hundred MB. I tweaked the overhead factor to test this hypothesis and indeed the "extra" memory is unchanged. One clue to the extra memory is that it scales with the number of spectra that are caught and used to fill the data numpy array. If only a fraction of spectra are caught, the returned numpy array is largely zeros but has a fixed size (8 bytes per complex64). Deleting the numpy array releases the expected amount of memory. However, the "extra" memory is not released and seems to scale with the number of spectra caught and managed in cython.