Training PET on data which is too large to fit in RAM - Githubissues

timoschick / pet

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

https://arxiv.org/abs/2001.07676

Apache License 2.0

1.62k stars 283 forks source link

Training PET on data which is too large to fit in RAM #39

Closed ghost closed 3 years ago

ghost commented 3 years ago

I am training a pet model on 500gb of text. I have properly processed the data, but I can't load all my data into a variable since I don't have nearly enough RAM to do that.

chris-aeviator commented 3 years ago

Random side note: I believe other projects have been solving this with the deepspeed/ deeperspeed libraries - might need loads of rework codewise before you can use it

ghost commented 3 years ago

Oh. That's sad, because I can't any code rework on my own :(

Random side note: I believe other projects have been solving this with the deepspeed/ deeperspeed libraries - might need loads of rework codewise before you can use it

So is there no simple way to do it? Could you help me?

ghost commented 3 years ago

In what ways did they use ms deepspeed for this?

timoschick commented 3 years ago

Hi @BleepLogger, the focus of PET is few-shot learning from 0-1000 examples. I'm not sure if this is really the right library for you if you've got 500GB of data to train on. We currently don't plan any modifications to PET that would support such large training datasets, so if you really want to use PET, you'll probably have to make them yourself. However, if you just want to use the 500GB of data for pretraining, a better approach would be to first use another library for pretraining and then use the resulting model with PET

ghost commented 3 years ago

Okay, I'm willing to make those modifications on my own. How do I make them?

ghost commented 3 years ago

Also, I have data in 80 parts. What if instead of fine-tuning PET on all of my data at once, I first fine-tune it on part 1, then fine-tune part 1's resulting model on part 2, and so on. Could this work?

ghost commented 3 years ago

@timoschick

timoschick commented 3 years ago

Okay, I'm willing to make those modifications on my own. How do I make them?

Sorry if I haven't been clear. What I was trying to say is that I don't know what the best way to train on such large datasets would be, so you don't just have to implement any modifications yourself, you'll also have to figure out which modifications are required on your own.