Closed mmcdermott closed 1 month ago
So, interestingly, this actual seems to have negative impacts on performance overall, probably because numpy array sub-slice memory management is not as optimized as separate python lists? I'm not totally sure. But it does seem to (very marginally) hurt performance. Nevertheless, this is a good thing to implement, because the reality is that we should build towards the setting where we only load the data for a given patient on the fly, which will necessitate splitting the flat data (in some form or another) every time either __getitem__
or collate
is called, so we will get hit fundamentally with the same performance cost every time regardless and the simpler logic is still helpful to enable an easier transition between the two slicing modes.
This will simplify the sharing of logic between
__getitem__
andload_slice
and, I suspect, have negligible to positive impacts on performance. This will help #20