neuroinformatics-unit / spikewrap

A tool to streamline extracellular electrophysiology analysis using SpikeInterface
BSD 3-Clause "New" or "Revised" License
13 stars 0 forks source link

Increase chunk size based on memory #104

Closed JoeZiminski closed 10 months ago

JoeZiminski commented 1 year ago

When writing to binary, a small chunk size is currently used. In general I think the less chunks the better, to avoid edge effects. A such it might be a good idea to a large possible chunk size that memory will allow (e.g. say use 70% of available memory)

JoeZiminski commented 1 year ago

108

JoeZiminski commented 11 months ago

Scaling the memory was not as simple as possible. First, SI's memory tracking is tagged to undergo so improvement and is not trivial to implement, so getting a good estimate of memory used during pre-processing is not easy.

Nonetheless, a rough guess could be used, taking the maximum itemsize (for float 64 used in some preprocessing steps) and a 2-3 times multiplier based on dev feedback.

See #108 for a first implementation. The current blocker on this was finding the available memory across different settings. 1) psutil.virtual_memory().available did not give accurate memory on SLURM nodes, giving a much higher value than requested (e.g. request 40GB it is showing 380 GB 2) slurmio always returned 16GB even if 40 GB was requested. e.g.

(spikewrap) jziminski@gpu-380-14:/ceph/neuroinformatics/neuroinformatics/scratch/jziminski/ephys/code/spikewrap$ sacct  --format="MaxRSS, MaxRSSNode"
    MaxRSS MaxRSSNode
---------- ----------
 45941200K gpu-380-14
     7992K gpu-380-14
    17116K enc3-node4

but

>>> SlurmJobParameters().requested_memory
16
>>> SlurmJobParameters().allocated_memory
16000000

Once this is resolved it would be possible to expose a argument 'fixed_batch_size' that allows the user to fix a batch size. Otherwise, use 70% or so of available memory.