Open thak123 opened 4 years ago
Here is the function (link) that I used to load BookCorpus dataset (we don't distribute BookCorpus because there is no publicly available link). I believe you can just load the amount of the files (extracted_files) that just fit your memory. And if you use DDP
, you can check out different set of files for each machine/dataloader and train the model.
Thanks for the link
❓ Questions and Help
Description Hi I am training quick though task dot(sent1~ sent2) and dot(sent2~sent2) but my dataset is 12 GB and it throws memory error once it crosses 575 GB in memory. I am training the system on a Cluster but its doesnt allow more than that to be utilized.
Is there an example of having dataset that can be in slices moved to memory and not at once.