Closed brando90 closed 3 years ago
Yes, I mean DataParallel but given your answer I don't think it'll help. learn2learn provides datasets and algorithms, both independent of the other. For an example of MAML + torchmeta, check this. Our implementation of MAML is orthogonal to higher (same result, different paths). If you're using vanilla MAML, this high-level interface is well-tested and fairly flexible (we use it for SL & RL). Yes, PyTorch Lightning. I saw you've been looking for a distributed MAML implementation for a while. Given how tricky MAML can be, I would try to average the meta-gradients with torch.distributed directly instead of using DDP. The distributed optimizer in cherry might help for that: https://github.com/learnables/cherry/blob/master/cherry/optim.py
Duplicate of #197.
related: https://github.com/learnables/learn2learn/issues/263#issuecomment-963372085 implementation in cherry
seba said:
Hi, I've just added an example of using the cherry's Distributed optimizer to train MAML on multiple GPUs with torch.distributed. Please check: https://github.com/learnables/learn2learn/blob/master/examples/vision/distributed_maml.py
related: #263 (comment) implementation in cherry
USE THAT ISSUE!
Does someone have a DDP example with maml? (that distributes over the meta-batches)
related (cherry): https://github.com/learnables/learn2learn/issues/197 related (torchmeta): https://github.com/tristandeleu/pytorch-meta/issues/116