snoplusuk / echidna

MIT License
4 stars 12 forks source link

Large Matrices #66

Open jwaterfield opened 9 years ago

jwaterfield commented 9 years ago

Echidna currently uses numpy arrays to store the spectra data. This is OK for spectra with a small number of dimensions and a small number of bins but for spectra with large N dimensions each with a large number of bins this will not work as there is not enough memory to store the matrix.

For example, currently if we want to store energy, xpos, ypos and zpos each with 500 bins then the line

self._data = numpy.zeros(shape=(500, 500, 500, 500), dtype=float)

is called which throws a MemoryError as the matrix is too large.

One possible solution is to use scipy's sparse matrices which condenses zero elements within the matrix. However if there is not enough zero elements in the matrix then we would have to look at using PyTables. Moving towards one of these solutions will also improve echidna's overall efficiency.

See this blog post for more details: http://www.philippsinger.info/?p=464