Closed juancq closed 7 months ago
Hey @juancq -- how does this impact the iteration speed through the dataloader, though? The motivation to convert things to lists was that with raw polars objects, the base iteration speed was much slower.
@mmcdermott I saw no noticeable difference in the iteration speed.
@juancq I'm working on a different solution for this problem that also addresses some other issues. I'll tag you in that other PR. It's not 100% ready but it is close. It is a larger change, but I'll explain more there.
This fixes the increased memory consumption issues when using multiple pytorch dataloaders (issue #73). It also dropped the starting memory usage in my test case from 30GB to 12GB.
Removing this line makes all the difference: https://github.com/mmcdermott/EventStreamGPT/blob/b10e7415af1e9ea9517dfb52c343ae8155c40674/EventStream/data/pytorch_dataset.py#L309
Editing the following line didn't make much of a difference, but I edited it for consistency: https://github.com/mmcdermott/EventStreamGPT/blob/b10e7415af1e9ea9517dfb52c343ae8155c40674/EventStream/data/pytorch_dataset.py#L306