Continuously increasing RAM with Pre-training

salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BSD 3-Clause "New" or "Revised" License

4.86k stars 648 forks source link

Continuously increasing RAM with Pre-training #77

Open abhisheksgumadi opened 2 years ago

abhisheksgumadi commented 2 years ago

Dear Team,

I am using the pre-training script to pre-train BLIP on a custom dataset (containing around 1M image/text pairs).

I see that the machine RAM utilization continuously increases and at a point it reaches 100%. The machine has 120GB RAM!.

Any idea where the problem could be?

woctezuma commented 2 years ago

Do you have custom code which could have a memory leak?

abhisheksgumadi commented 2 years ago

We have a a custom dataloder that loads images and text from a parquet file.

abhisheksgumadi commented 2 years ago

We have 1 Million images stored on disk and we have prepared the JSON file as described in the Github read me page. The Dataloader we have loads the json file in memory in the __init__ method and then in the __get_item__ method it loads the image from the corresponding path inside the json file. Also returns back the text.

Now sure why the RAM utilization is so high? Any idea please? Thanks

LiJunnan1992 commented 2 years ago

Hi, it could be related to the dataloader.

abhisheksgumadi commented 2 years ago

We ended up using the pretrain_dataset.py file and formatted the data as a json file exactly as mentioned in the readme file. Even then we see the RAM utilization go to 100%. So now we have just formatted the dataset as required with no changes to the code. So we dont even have our own custom code.

abhisheksgumadi commented 2 years ago

We are happy to follow any other debugging steps to make this a success please. - thanks

asgsaeid commented 2 years ago

Was wondering if there has been any update on this. We ran the pretrain.py and saw the same issue: RAM size increases when the jason files are being read and at some point, RAM explodes. For pretraining, what python version did you use and what was the RAM size?

LiJunnan1992 commented 2 years ago

@abhisheksgumadi @asgsaeid You may want to try out our new library which supports BLIP and see if the issue still remains: https://github.com/salesforce/LAVIS

abhisheksgumadi commented 2 years ago

Thanks, will take a look

dyashuni commented 1 year ago

hope this helps https://ppwwyyxx.com/blog/2022/Demystify-RAM-Usage-in-Multiprocess-DataLoader/

aries-young commented 1 year ago

Was wondering if there has been any update on this. We ran the pretrain.py and saw the same issue: RAM size increases when the jason files are being read and at some point, RAM explodes. For pretraining, what python version did you use and what was the RAM size?

Have you solved this problem？Could you kindly provide some suggestions ?

aries-young commented 1 year ago

Thanks, will take a look

Have you solved this problem？Could you kindly provide some suggestions ?