learnables / learn2learn

A PyTorch Library for Meta-learning Research
http://learn2learn.net
MIT License
2.66k stars 353 forks source link

Fix memory leak #307

Closed kzhang2 closed 2 years ago

kzhang2 commented 2 years ago

Description

Fixes #284

Fix memory leak in maml.py and meta-sgd.py and add tests to maml_test.py and metasgd_test.py to check for possible future memory leaks. A test involving cloning parameters seems to fail, but my changes have nothing to do with it.

If necessary, use the following space to provide context or more details.

Contribution Checklist

If your contribution modifies code in the core library (not docs, tests, or examples), please fill the following checklist.

Optional

If you make major changes to the core library, please run make alltests and copy-paste the content of alltests.txt below.

make[1]: Entering directory '/home/kevin/Documents/umd_cp/research/open-source/learn2learn'
OMP_NUM_THREADS=1 \
MKL_NUM_THREADS=1 \
python -W ignore -m unittest discover -s 'tests' -p '*_test.py' -v
9464832it [00:01, 4735385.84it/s]                             otIntegrationTests) ... 
6463488it [00:01, 4715230.93it/s]                             
ok
test_adaptation (unit.algorithms.gbml_test.TestGBMLgorithm) ... ok
test_allow_nograd (unit.algorithms.gbml_test.TestGBMLgorithm) ... Traceback (most recent call last):
  File "/home/kevin/Documents/umd_cp/research/open-source/learn2learn/learn2learn/optim/parameter_update.py", line 119, in forward
    gradients = torch.autograd.grad(
  File "/home/kevin/anaconda3/envs/research/lib/python3.8/site-packages/torch/autograd/__init__.py", line 234, in grad
    return Variable._execution_engine.run_backward(
RuntimeError: One of the differentiated Tensors does not require grad
ok
test_allow_unused (unit.algorithms.gbml_test.TestGBMLgorithm) ... ok
test_clone_module (unit.algorithms.gbml_test.TestGBMLgorithm) ... ok
test_graph_connection (unit.algorithms.gbml_test.TestGBMLgorithm) ... ok
test_adaptation (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_allow_nograd (unit.algorithms.maml_test.TestMAMLAlgorithm) ... Traceback (most recent call last):
  File "/home/kevin/Documents/umd_cp/research/open-source/learn2learn/learn2learn/algorithms/maml.py", line 159, in adapt
    gradients = grad(loss,
  File "/home/kevin/anaconda3/envs/research/lib/python3.8/site-packages/torch/autograd/__init__.py", line 234, in grad
    return Variable._execution_engine.run_backward(
RuntimeError: One of the differentiated Tensors does not require grad
ok
test_allow_unused (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_clone_module (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_first_order_adaptation (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_graph_connection (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_memory_consumption (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_module_shared_params (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_adaptation (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
test_clone_module (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
test_graph_connection (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
test_memory_consumption (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
test_meta_lr (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
9464832it [00:02, 3925553.32it/s]                             
6463488it [00:01, 4160425.13it/s]                             
test_data_labels_length (unit.data.metadataset_test.TestMetaDataset) ... ok
test_data_labels_values (unit.data.metadataset_test.TestMetaDataset) ... ok
test_data_length (unit.data.metadataset_test.TestMetaDataset) ... ok
test_fails_with_non_torch_dataset (unit.data.metadataset_test.TestMetaDataset) ... ok
test_filtered_metadataset (unit.data.metadataset_test.TestMetaDataset) ... ok
test_get_item (unit.data.metadataset_test.TestMetaDataset) ... ok
test_labels_to_indices (unit.data.metadataset_test.TestMetaDataset) ... ok
test_union_metadataset (unit.data.metadataset_test.TestMetaDataset) ... ok
test_dataloader (unit.data.task_dataset_test.TestTaskDataset) ... Downloading https://raw.githubusercontent.com/brendenlake/omniglot/master/python/images_background.zip to ./data/omniglot-py/images_background.zip
Extracting ./data/omniglot-py/images_background.zip to ./data/omniglot-py
Downloading https://raw.githubusercontent.com/brendenlake/omniglot/master/python/images_evaluation.zip to ./data/omniglot-py/images_evaluation.zip
Extracting ./data/omniglot-py/images_evaluation.zip to ./data/omniglot-py
0 Meta Train Accuracy 0.42500000912696123
1 Meta Train Accuracy 0.5062500112690032
2 Meta Train Accuracy 0.537500012665987
3 Meta Train Accuracy 0.43125001015141606
4 Meta Train Accuracy 0.5187500142492354
learn2learn: Maybe try with allow_nograd=True and/orallow_unused=True ?
learn2learn: Maybe try with allow_nograd=True and/or allow_unused=True ?
Downloading https://raw.githubusercontent.com/brendenlake/omniglot/master/python/images_background.zip to /tmp/datasets/omniglot-py/images_background.zip
Extracting /tmp/datasets/omniglot-py/images_background.zip to /tmp/datasets/omniglot-py
Downloading https://raw.githubusercontent.com/brendenlake/omniglot/master/python/images_evaluation.zip to /tmp/datasets/omniglot-py/images_evaluation.zip
Extracting /tmp/datasets/omniglot-py/images_evaluation.zip to /tmp/datasets/omniglot-py
Downloading FC100. (160Mb)
Downloading CIFARFS to  /home/kevin/data
Creating CIFARFS splits
ok
test_infinite_tasks (unit.data.task_dataset_test.TestTaskDataset) ... ok
test_instanciation (unit.data.task_dataset_test.TestTaskDataset) ... ok
test_task_caching (unit.data.task_dataset_test.TestTaskDataset) ... ok
test_task_transforms (unit.data.task_dataset_test.TestTaskDataset) ... ok
test_filter_labels (unit.data.transforms_test.TestTransforms) ... ok
test_k_shots (unit.data.transforms_test.TestTransforms) ... ok
test_load_data (unit.data.transforms_test.TestTransforms) ... ok
test_n_ways (unit.data.transforms_test.TestTransforms) ... ok
test_remap_labels (unit.data.transforms_test.TestTransforms) ... ok
test_infinite_iterator (unit.data.utils_test.DataUtilsTests) ... ok
test_partition_task (unit.data.utils_test.DataUtilsTests) ... ok
test_illegal_dimensions (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_illegal_dimensions_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_m_edge (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_m_edge_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_m_n_edge (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_m_n_edge_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_n_edge (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_n_edge_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_simple (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_simple_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_cosine_distance (unit.nn.protonet_test.PrototypicalClassifierTests) ... ok
test_euclidean_distance (unit.nn.protonet_test.PrototypicalClassifierTests) ... ok
test_simple (unit.nn.protonet_test.PrototypicalClassifierTests) ... ok
test_clone_module_basics (unit.utils_test.UtilTests) ... ok
test_clone_module_models (unit.utils_test.UtilTests) ... ok
test_clone_module_nomodule (unit.utils_test.UtilTests) ... ok
test_distribution_clone (unit.utils_test.UtilTests) ... ok
test_distribution_detach (unit.utils_test.UtilTests) ... ok
test_module_clone_shared_params (unit.utils_test.UtilTests) ... ok
test_module_detach (unit.utils_test.UtilTests) ... ok
test_module_detach_keep_requires_grad (unit.utils_test.UtilTests) ... ok
test_module_update_shared_params (unit.utils_test.UtilTests) ... FAIL
test_rnn_clone (unit.utils_test.UtilTests) ... ok

======================================================================
FAIL: test_module_update_shared_params (unit.utils_test.UtilTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/kevin/Documents/umd_cp/research/open-source/learn2learn/tests/unit/utils_test.py", line 268, in test_module_update_shared_params
    self.assertTrue(
AssertionError: False is not true : clone and original do not have same number of parameters.

----------------------------------------------------------------------
Ran 62 tests in 128.143s

FAILED (failures=1)
make[1]: *** [Makefile:31: tests] Error 1
make[1]: Leaving directory '/home/kevin/Documents/umd_cp/research/open-source/learn2learn'
seba-1511 commented 2 years ago

Thanks a lot @kzhang2 -- this looks great (incl. Meta-SGD!). I'll merge and cut a new release as soon as it passes the tests.

seba-1511 commented 2 years ago

OK, it took a bit of elbow grease but it seems to work now (I also took the opportunity to rewrite some flaky tests). Thanks for contributing this.