pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
83.5k stars 22.53k forks source link

torch.hub.set_dir() is ignored for downloaded models #31944

Closed jheinecke closed 4 years ago

jheinecke commented 4 years ago

🐛 Bug

I use torch.hub.set_dir(localdir) in order to avoid having the model files hidden in .cache/torch/pytorch_fairseq/3f864e15bb396f062dd37494309dbc4238416edd1f8e..., but only pytorch_fairseq_master goes there, the model itself (model.pt, dict.txt) are still in .cache/torch/pytorch_fairseq/3f864e1...

To Reproduce

Steps to reproduce the behavior:

import torch
torch.hub.set_dir("/opt/models/xlmr.large")
model = torch.hub.load('pytorch/fairseq', 'xlmr.large')
$> ls /opt/models/xlmr.large
pytorch_fairseq_master

Expected behaviour

According to the documentation of torch.hub.set_dir() I expected the following in /opt/models/xlmr.large:

But the latter three reside in .cache/torch/pytorch_fairseq/3f864e15bb396f...

Environment

cc @ezyang @gchanan @zou3519 @ailzhang

jheinecke commented 4 years ago

I've just noted that export TORCH_HOME=/opt/models works correctly. After the call to torch.hub.load('pytorch/fairseq', 'xlmr.large') (and no use of set_dir()), opt/models contains

zou3519 commented 4 years ago

I think this is hi-pri because this looks like a correctness issue (torch.hub.set_dir doesn't work) that is a common use-case when using torch.hub.

jheinecke commented 4 years ago

Thanks for taking care of this. Same problem for

import torch
torch.hub.set_dir("/opt/models/bert-mlg-cased")
tokenizer = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-base-multilingual-uncased')
model = torch.hub.load('huggingface/pytorch-transformers', 'model', 'bert-base-multilingual-uncased') 

I had a look at the pytorch-transformers and pytorch_fairseq_master which is downloaded and compiled when you run torch.hub.load() the first time. Both import from torch.hub

from torch.hub import _get_torch_home

but ignore the global hub_dir set by set_dir() (in torch/hub.py), whereas the environment variable TORCH_HOME is read in _get_torch_home()