mmcdermott / nested_ragged_tensors

Utilities for efficiently working with, saving, and loading, collections of connected nested ragged tensors in PyTorch
MIT License
7 stars 1 forks source link

Tensors should not be split into lists of lists until densification. #21

Closed mmcdermott closed 1 month ago

mmcdermott commented 1 month ago

This will simplify the sharing of logic between __getitem__ and load_slice and, I suspect, have negligible to positive impacts on performance. This will help #20

mmcdermott commented 1 month ago

So, interestingly, this actual seems to have negative impacts on performance overall, probably because numpy array sub-slice memory management is not as optimized as separate python lists? I'm not totally sure. But it does seem to (very marginally) hurt performance. Nevertheless, this is a good thing to implement, because the reality is that we should build towards the setting where we only load the data for a given patient on the fly, which will necessitate splitting the flat data (in some form or another) every time either __getitem__ or collate is called, so we will get hit fundamentally with the same performance cost every time regardless and the simpler logic is still helpful to enable an easier transition between the two slicing modes.