ppwwyyxx / RAM-multiprocess-dataloader

Demystify RAM Usage in Multi-Process Data Loaders
Apache License 2.0
168 stars 10 forks source link

How to avoid issues with dictionaries? #8

Open jj0mst opened 1 year ago

jj0mst commented 1 year ago

Hello,

I found the repo and blog post very interesting and useful, especially the tensor serialization utility.

I've encountered similar problems with RAM usage, but in my case I have to use dictionaries to store data, or even dictionaries of list.

Can you confirm that the issues you presented may happen with dictionaries too? I also happen to see many "too many open files errors" due to this.

Secondly, how can I serialize dictionaries in a torch Tensor instead of using lists?

I could work with lists too, but it would be very complicated to retrieve the correct Tensor in the main processes then (I find dictionaries to be more flexible on this point of view).

thesofakillers commented 1 year ago

also interested. I saw this comment mention frozendicts (im guessing from this library).

@jj0mst, was wondering what solution you settled on in the meantime if you can share. Thanks!