Running out of memory (125gb) when building my dataset

huggingface / transfer-learning-conv-ai

🦄 State-of-the-Art Conversational AI with Transfer Learning

MIT License

1.73k stars 431 forks source link

I guess there's no easy way to solve this, here's what I tried and worked in lowering my RAM usage:

deleting unused dataset files immediately, while building the dataset (eg. in the build inputs, labels for loops)
switching to int32-based numpy arrays instead of lists

In the end I had to lower the number of candidates and the history size, but the biggest improvement was from implementing the Dataset class, and putting the padding inside the getitem function. I think this is the only way for this code to work for larger datasets, if the Dataset class is implemented from the beginning instead of building the whole dataset in memory.

huggingface / transfer-learning-conv-ai

Running out of memory (125gb) when building my dataset #62