Describe the bug
The COCODataset implementation uses the COCO API to load in dataset annotations upon initialization. However, since a COCODataset instance is created once for every GPU worker, the dataset annotations get loaded in once for each worker. This works fine for smaller datasets, but with larger datasets this very quickly eats up all system RAM (NOT GPU RAM) when using multiple GPUs. It appears that the BaseDataset class intends to set serialize_data to True by default which should result in the dataset being shared across GPU workers, but this does not appear to work with COCODataset since the actual loading of annotations happens before the data ever has a chance to be serialized.
Did you make any modifications on the code or config? Did you understand what you have modified?
The only modifications I made to my config file were to point it to my dataset location.
What dataset did you use?
A 120GB custom COCO format instance segmentation dataset containing a high volume of instances per image (~100-250 instances per image)
Environment
Error traceback
There is no applicable traceback. During the "loading annotations" phase before training when analyzing system RAM (my personal workstation has 256GB of RAM) the memory usage will steadily climb until it hits 256GB and the worker is killed.
Bug fix
See above. This occurs due to the dataset being loaded by the pycocotools API once for every GPU.
Has anyone else been able to look into this or run into this issue? There is no reason this shouldn't be fixable, but it seems like it might be a fairly significant undertaking
Describe the bug The COCODataset implementation uses the COCO API to load in dataset annotations upon initialization. However, since a COCODataset instance is created once for every GPU worker, the dataset annotations get loaded in once for each worker. This works fine for smaller datasets, but with larger datasets this very quickly eats up all system RAM (NOT GPU RAM) when using multiple GPUs. It appears that the BaseDataset class intends to set serialize_data to True by default which should result in the dataset being shared across GPU workers, but this does not appear to work with COCODataset since the actual loading of annotations happens before the data ever has a chance to be serialized.
Reproduction
Error traceback There is no applicable traceback. During the "loading annotations" phase before training when analyzing system RAM (my personal workstation has 256GB of RAM) the memory usage will steadily climb until it hits 256GB and the worker is killed.
Bug fix See above. This occurs due to the dataset being loaded by the pycocotools API once for every GPU.