uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k stars 285 forks source link

Memory usage is bigger and bigger with epoch, How can I solve this problem? #652

Open Byronnar opened 3 years ago

Byronnar commented 3 years ago

When I using the petastorm pytorch Dataloader, Memory usage is bigger and bigger with epoch. My datasets size only 62m, But the memory need 12G(epoch1: 6.8G, epoch2: 8.1G, epoch3: 10.1G...).

This is my code:

    for epoch in range(1, num_epochs + 1):
        with DataLoader(make_reader( data_path, num_epochs=1, workers_count=1, results_queue_size=1, transform_spec=transform,
                             schema_fields=['image', 'label']),
                batch_size=batch_size, shuffling_queue_capacity=10 ) as train_loader:  # shuffling_queue_capacity=6
            train_model( model, criterion, optimizer, scheduler, train_loader )

        with DataLoader(
                make_reader( data_path, num_epochs=1, workers_count=1, results_queue_size=1, transform_spec=transform,
                             schema_fields=['image', 'label'] ),
                batch_size=batch_size ) as test_loader:
            test_model( model, criterion, test_loader, model_write_path )

ENV:

ubuntu: 1804 
petastorme: 0.9.4

The batch size: 1 and memory is CPU RAM memory. How can I solve this problem? Looking forward to your reply!

@selitvin

selitvin commented 3 years ago
  1. In order to confirm that the memory growth does indeed originate from within petastorm reader, can you confirm that if train_model/test_model goals are replaced with just a loop that drains the samples, the memory footprint still increases over epochs?
  2. If we take DataLoader out of the setup as well and for sample in reader: directly, do you still observe the memory leakage? Example:
    with make_reader( data_path, num_epochs=1, workers_count=1, results_queue_size=1, transform_spec=transform,
                             schema_fields=['image', 'label'] ) as reader:
     for s in reader:
         pass
  3. Does removing transform_spec= in setup as shown in (2) help?
  4. Does removing shuffling_queue_capacity= argument help?
  5. Are you reading from hdfs (are you using libhdfs or libhdfs3 driver)?
Byronnar commented 3 years ago

Thanks for your reply!!!

I have test this code:

with make_reader( data_path, num_epochs=1, workers_count=1, results_queue_size=1, transform_spec=transform,
                             schema_fields=['image', 'label'] ) as reader:
     for s in reader:
         pass

But I encountered the same problem, always memory growth . I have removed shuffling_queue_capacity params but it is not work.. I am reading from hdfs(libhdfs3).

But when I removed transform_spec, it is work. This is my transformer code:

def _transform_row(data_row):
    transform = transforms.Compose([
        transforms.Resize( (300, 300) ),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] )
    ])

    PIL_image = Image.fromarray( data_row['image'] ) 

    result_row = {
        'image': transform( PIL_image ),
        'label': data_row['label']
    }

    return result_row
#### data_row is read from hdfs. the colums is: ['name', 'image', 'label'], the image stored the  image arrary .

Could you please tell me what should I modify the _transform_row? Thank you again.

selitvin commented 3 years ago

Would try the following:

  1. As an experiment insert an import gc; gc.collect() statement into your for loop to make sure we are not observing "breathing" of garbage collector.
  2. Use a local copy of the dataset to make sure it's not libhdfs3 driver (the driver no longer supported by anyone. Not sure if there are any issues with that code)
  3. Try disabling pieces of the _transform_row and see if you observe the memory growth. Would you still see the memory growth with a no-op implementation of _transform_row?