torch.hub.set_dir() is ignored for downloaded models

jheinecke commented 4 years ago

🐛 Bug

I use torch.hub.set_dir(localdir) in order to avoid having the model files hidden in .cache/torch/pytorch_fairseq/3f864e15bb396f062dd37494309dbc4238416edd1f8e..., but only pytorch_fairseq_master goes there, the model itself (model.pt, dict.txt) are still in .cache/torch/pytorch_fairseq/3f864e1...

To Reproduce

Steps to reproduce the behavior:

import torch
torch.hub.set_dir("/opt/models/xlmr.large")
model = torch.hub.load('pytorch/fairseq', 'xlmr.large')

$> ls /opt/models/xlmr.large
pytorch_fairseq_master

Expected behaviour

According to the documentation of torch.hub.set_dir() I expected the following in /opt/models/xlmr.large:

pytorch_fairseq_master/
dict.txt
model.pt
sentencepiece.bpe.model

But the latter three reside in .cache/torch/pytorch_fairseq/3f864e15bb396f...

Environment

PyTorch Version (e.g., 1.0): 1.3.1
OS (e.g., Linux): Linux (Ubuntu 18.04.3 LTS)
GCC: 7.4.0
CMake: version 3.10.2
How you installed PyTorch (conda, pip, source): conda
Python version: 3.7
GPU
- GPU 0: GeForce GTX 1080 Ti
- GPU 1: GeForce GTX 1080 Ti

cc @ezyang @gchanan @zou3519 @ailzhang

jheinecke commented 4 years ago

I've just noted that export TORCH_HOME=/opt/models works correctly. After the call to torch.hub.load('pytorch/fairseq', 'xlmr.large') (and no use of set_dir()), opt/models contains

hub/pytorch_fairseq_master/...
pytorch_fairseq/3f864e15bb396f... (still that hashed name I'd like to get rid of, but at least in a directory I have chosen :-)

zou3519 commented 4 years ago

I think this is hi-pri because this looks like a correctness issue (torch.hub.set_dir doesn't work) that is a common use-case when using torch.hub.

jheinecke commented 4 years ago

Thanks for taking care of this. Same problem for

import torch
torch.hub.set_dir("/opt/models/bert-mlg-cased")
tokenizer = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-base-multilingual-uncased')
model = torch.hub.load('huggingface/pytorch-transformers', 'model', 'bert-base-multilingual-uncased')

I had a look at the pytorch-transformers and pytorch_fairseq_master which is downloaded and compiled when you run torch.hub.load() the first time. Both import from torch.hub

from torch.hub import _get_torch_home

but ignore the global hub_dir set by set_dir() (in torch/hub.py), whereas the environment variable TORCH_HOME is read in _get_torch_home()

pytorch / pytorch