RAM allocation issues when creating empty NXfields

P4ulW commented 1 year ago

When creating an empty, large NXfield object and saving the NeXus-file, only little memory is allocated. Loading the file with nxload does in some cases consume a lot of memory, though it is not clear why. On my machine, the following example shows intransparent behavior:

 from nexusformat.nexus import *
 import numpy as np

  shape = (10, 300, 1000, 1000)
  signal =  NXfield(
      shape=shape,
      dtype=np.int32,
      name="signal")

  signal.save("test.nxs")

  nxf = nxload("test.nxs")
  print(nxf.tree)`

Depending on shape, calling nxf.tree sometimes uses memory equivalent to the NXfield size and sometimes not.

shape = (10, 300, 1000, 1000) causes the tree call to allocated memory
shape = (10, 500, 1000, 1000) causes no memory allocation
shape = (10, 100, 1000, 1000) also causes no memory allocation

This weird behavior only occurred on Windows so far (10 & 11). Nexus-Format Version : nexusformat==1.0.2

rayosborn commented 1 year ago

Thanks for the report. When you say that this only occurs on Windows, does that mean you have specifically tested this on Linux or Macs and confirmed that memory is never allocated when printing the tree? If so, I will have some difficulty reproducing this since I don't have access to a Windows machine, but I could ask colleagues to check.

P4ulW commented 1 year ago

I have specifically tested this on Linux and I cannot reproduce this behavior in Linux (Fedora, Manjaro).

While investigating this issue, I also noticed that passing an array initialized with the np.zeros-function shows differences in memory allocation on Windows an Linux. Maybe this is an issue caused by numpy or hdf5 working differently between the different OSs?

P4ulW commented 1 year ago

Any news regarding this issue? We would like to integrate the NeXus-standard into our data pipeline, but this issue currently inhibits this severely.

rayosborn commented 1 year ago

Thanks for the prompt. It's tricky for me to look into this since I don't use a Windows machine. Would it be possible to produce a script to reproduce this, with calls to check the system memory allocation? I can then ask a colleague to try it on their machine.

P4ulW commented 1 year ago

This small script replicates the issue on my machine:

from nexusformat.nexus import *
import numpy as np
import os
from memory_profiler import profile

def clean_up():
    try:
        os.remove("test.nxs")
    except: pass

@profile
def create_empty_nxsfile(nw, nx, ny, nz):
    clean_up()
    shape = (nw, nx, ny, nz)
    signal =  NXfield(
        shape=shape,
        dtype=np.int32,
        name="signal")

    signal.save("test.nxs")

    nxf = nxload("test.nxs")
    nxf.tree

if __name__ == "__main__":
    create_empty_nxsfile(10, 100, 1000, 1000)

    create_empty_nxsfile(10, 300, 1000, 1000)

    create_empty_nxsfile(10, 500, 1000, 1000)

The issue should occur on the second call of create_empty_nxsfile

rayosborn commented 1 year ago

A colleague was able to reproduce the issue with the array with shape (10,300,1000,1000) and not the others. The only thing I noticed is that the chunk size, which is selected automatically by HDF5 by default, is a little anomalous. It's (1,7,63,63) when the second dimension is 100, (1,19,63,63) for 300, and (1,16,63,63) for 500. I have no idea why the chunk size is slightly larger for the smaller array, and the difference looks too small to have any effect, but it made me wonder if it was worth setting the chunk size manually just in case this is some arcane anomaly in the HDF5 code.

You can set the chunk size by calling. e.g., signal = NXfield(shape=shape, dtype=np.int32, name="signal", chunks=(10,50,50,50). I would be interested if this makes any difference.

The other thing to check is whether this increased memory occurs if you open the file directly in h5py. Obviously, you don't want to load the whole array, but I presume the memory increase should occur if you just open the dataset for inspection, e.g.,

import h5py
with h5py.File('test.nxs', 'r') as f:
    print(f['entry/signal'].shape)

rayosborn commented 1 year ago

Another thought occurred to me. The nexusformat package will normally not attempt to load an array if its size is larger than 1000. If it's smaller, it is read into a private NXfield attribute, _value. Just to confirm this, it would be worth checking what _value is after loading the data. It should be None.

>>> nxf = nxload("test.nxs")
>>> print(nxf['entry/signal']._value)
None

If it's not None, please let me know.

P4ulW commented 1 year ago

Thank you for the investigations. Opening the file with HDF directly did not allocate any significant memory. Checking the _value property revealed that it is None for the shapes (10, 100, 1000, 1000) and (10, 500, 1000, 1000) and the fully initialized array for the anomalous case (10, 300, 1000, 1000).

rayosborn commented 1 year ago

So that seems to identify the problem, although I have no idea why a value of (10,300,1000,1000) should trigger the array to be read into memory, and then only on Windows. This will be tricky for me to debug, but if I think of things that you could test, I will let you know.

rayosborn commented 1 year ago

A quick thing to check would be to operate on the NXFile object, created when you open the file. The following returns a tuple, whose first value should be None unless it chose to read it into memory.

>>> from nexusformat.nexus import *
>>> f = NXFile('test.nxs')
>>> f.nxpath = '/entry/signal'
>>> f.readvalues()
(None, (10, 300, 1000, 1000), dtype('int32'), {})

The value should be None, unless np.prod(shape) is less than 1000. That is literally the only check triggering a read. If a shape of (10, 300, 1000, 1000) triggers the large memory allocation, let me know what np.prod((10,300,1000,1000)) gives (and check its value against the other shapes).

P4ulW commented 1 year ago

Tested this for different sizes, all with the same result: First value is always None. np.prod also gives the expected values.

rayosborn commented 1 year ago

I will have to see if I can get access to a debugger on Windows, because I can't see off-hand how a change of shape can have such a major effect. At least, we've made some progress.

rayosborn commented 1 year ago

I'm afraid that I will be away for 10 days, so I can't make progress on this issue for a little while. I don't know if you had a chance to check whether changing the chunk size made a difference. Another possibility is to make the NXfield with a smaller dimension, but with maxshape set to a larger value.

NXfield(shape=(10,100,1000,1000), dtype=np.float32, maxshape=(10,300,1000,1000))

You can then resize the array when you have values to add beyond the original allocation (https://nexpy.github.io/nexpy/treeapi.html#nexusformat.nexus.tree.NXfield.resize).

P4ulW commented 1 year ago

Setting the chunk size manually did not alleviate the issue. Also using the maxshape and resizing later unfortunately only postpones the issue until the next time the file is opened.

P4ulW commented 12 months ago

To clarify my last comment: Resizing works fine, but since the RAM-allocation occurs when tree-method is called, the issue still persists in previously resized arrays upon opening.

Are there any news regarding the issue?

P4ulW commented 11 months ago

I did some digging myself. What I found is the following: in _read_data there is the if np.prod(field.shape) < 1000 check, which determines if the data is loaded or not. I added a simple print statement to see what happens:

field = self.get(self.nxpath)
            # Read in the data if it's not too large
            print("field lenght: ", np.prod(field.shape))
            if np.prod(field.shape) < 1000:  # i.e., less than 1k dims
                try:
                    value = self.readvalue(self.nxpath)
                except Exception:
                    value = None

Under windows for specific shapes, this is the value of the field lenght:	shape	output
(10, 100, 1000, 1000)	$10^9$	$1\cdot10^9$
(10, 300, 1000, 1000)	$-1294967296$	$3\cdot10^9$
(10, 500, 1000, 1000)	$705032704$	$5\cdot10^9$

So there is some overflow occuring with np.prod in windows for some reason. Which is quite worrisome.

This should solve it: np.prod(np.array(field.shape).astype(np.int64))

For some reason integer numpy arrays are 32bit in windows ...

rayosborn commented 11 months ago

Sorry I have not been able to work on this issue, since I still don't have access to a Windows machine for debugging. However, I think you have identified the problem correctly. The solution is not, I believe, to cast the shape array as np.int64, but to ensure that np.prod returns a np.int64 value, i.e., np.prod(shape, dtype=np.int64). I have submitted and merged a PR to the nexusformat package, which modifies the _getsize function to return a 64-bit integer. If you are able to test the latest development version in the main branch, we'll see if it's sufficient to resolve this issue. If it works, I should be able to release a new version quickly.

P4ulW commented 11 months ago

I tested the recent fix #206 , and the issue appears to be fixed as far as I can tell. Thank you for the fast implementation.

rayosborn commented 11 months ago

Great. I'll try to get it released in the next couple of days, once I've double-checked that there are no other side-effects.

rayosborn commented 11 months ago

This should be fixed in v1.0.3, which is available on PyPI. The conda-forge package should be available later today. @P4ulW, thanks for raising the issue.

nexpy / nexusformat

RAM allocation issues when creating empty NXfields #204