salesforce / decaNLP

The Natural Language Decathlon: A Multitask Challenge for NLP
BSD 3-Clause "New" or "Revised" License
2.34k stars 474 forks source link

connection error for /research.metamind.io/cove/wmtlstm-8f474287.pth #46

Closed smsalaken closed 5 years ago

smsalaken commented 5 years ago

Docker fails to find /research.metamind.io/cove/wmtlstm-8f474287.pth and keeps returning Temporary failure in name resolution error. When I paste research.metamind.io in browser, it fails to resolve the IP. When I paste metamind.io, it goes to https://einstein.ai/. How do I sove this?

Other information:

I am inside /home/documents/some_other_folders/decalnlp directory. The declanlp folder was created by git clone.

Command yielding error:

 sudo docker run -it --rm -v `pwd`:/decaNLP/ -u $(id -u):$(id -g) bmccann/decanlp:torch041 bash -c "python decaNLP/predict.py --evaluate validation --path decaNLP/mqan_decanlp_better_sampling_cove_cpu/ --checkpoint_name iteration_560000.pth --device -1 --silent"

Error:

Arguments:
{'best_checkpoint': 'decaNLP/mqan_decanlp_better_sampling_cove_cpu/iteration_560000.pth',
 'bleu': False,
 'checkpoint_name': 'iteration_560000.pth',
 'cove': True,
 'data': '/decaNLP/.data/',
 'devices': [-1],
 'dimension': 200,
 'dropout_ratio': 0.0,
 'elmo': [-1],
 'embeddings': '/decaNLP/.embeddings',
 'evaluate': 'validation',
 'glove_and_char': True,
 'intermediate_cove': False,
 'load': None,
 'lower': True,
 'max_generative_vocab': 50000,
 'max_output_length': 100,
 'max_val_context_length': 400,
 'model': 'MultitaskQuestionAnsweringNetwork',
 'overwrite': False,
 'path': 'decaNLP/mqan_decanlp_better_sampling_cove_cpu/',
 'rnn_layers': 1,
 'rouge': False,
 'seed': 123,
 'silent': True,
 'task_to_metric': {'cnn_dailymail': 'avg_rouge',
                    'iwslt.en.de': 'bleu',
                    'multinli.in.out': 'em',
                    'schema': 'em',
                    'squad': 'nf1',
                    'srl': 'nf1',
                    'sst': 'em',
                    'wikisql': 'lfem',
                    'woz.en': 'joint_goal_em',
                    'zre': 'corpus_f1'},
 'tasks': ['squad',
           'iwslt.en.de',
           'cnn_dailymail',
           'multinli.in.out',
           'sst',
           'srl',
           'zre',
           'woz.en',
           'wikisql',
           'schema'],
 'transformer_heads': 3,
 'transformer_hidden': 150,
 'transformer_layers': 2,
 'val_batch_size': [256, 256, 256, 256, 256, 256, 256, 256, 256, 256]}
Loading from decaNLP/mqan_decanlp_better_sampling_cove_cpu/iteration_560000.pth
Initializing Model
Downloading: "https://s3.amazonaws.com/research.metamind.io/cove/wmtlstm-8f474287.pth" to /decaNLP/.embeddings/wmtlstm-8f474287.pth
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/urllib3/connection.py", line 171, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/opt/conda/lib/python3.6/site-packages/urllib3/util/connection.py", line 56, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/opt/conda/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py", line 343, in _make_request
    self._validate_conn(conn)
  File "/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py", line 849, in _validate_conn
    conn.connect()
  File "/opt/conda/lib/python3.6/site-packages/urllib3/connection.py", line 314, in connect
    conn = self._new_conn()
  File "/opt/conda/lib/python3.6/site-packages/urllib3/connection.py", line 180, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f0396c401d0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/requests/adapters.py", line 445, in send
    timeout=timeout
  File "/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/opt/conda/lib/python3.6/site-packages/urllib3/util/retry.py", line 398, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='s3.amazonaws.com', port=443): Max retries exceeded with url: /research.metamind.io/cove/wmtlstm-8f474287.pth (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f0396c401d0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "decaNLP/predict.py", line 297, in <module>
    model = Model(field, args)
  File "/decaNLP/models/multitask_question_answering_network.py", line 33, in __init__
    self.cove = MTLSTM(model_cache=args.embeddings, layer0=args.intermediate_cove, layer1=args.cove)
  File "/src/cove/cove/encoder.py", line 45, in __init__
    state_dict = model_zoo.load_url(model_urls['wmt-lstm'], model_dir=model_cache)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/model_zoo.py", line 65, in load_url
    _download_url_to_file(url, cached_file, hash_prefix, progress=progress)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/model_zoo.py", line 71, in _download_url_to_file
    u = urlopen(url, stream=True)
  File "/opt/conda/lib/python3.6/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/requests/sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/conda/lib/python3.6/site-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/requests/adapters.py", line 513, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='s3.amazonaws.com', port=443): Max retries exceeded with url: /research.metamind.io/cove/wmtlstm-8f474287.pth (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f0396c401d0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))

System details:

>> lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.5 LTS
Release:    16.04
Codename:   xenial

>> uname -r
4.15.0-43-generic
bmccann commented 5 years ago

Maybe check the connection on your machine. I've run the training command just now with the --cove flag, and the weight seem to download just fine. I've also been able to download using wget from https://s3.amazonaws.com/research.metamind.io/cove/wmtlstm-8f474287.pth. I've tested this on a local and a remote machine, and both seem to have no issues.

smsalaken commented 5 years ago

You're probably right. It's probably because docker cannot connect to internet. Both docker run busybox ping -c 1 192.203.230.10 and docker run busybox nslookup google.com fails. I cannot ping google.com from my terminal either. I would assume all ICMP traffic and DNS lookup is blocked in my current network. Will report some update after trying a different network.

smsalaken commented 5 years ago

This was indeed a network issue. Some type of traffic was blocked. However, running that command using my cellphone network and then having a connection interruption is causing another issue for which I may need to open another issue. But let me try other approaches first. Thanks for your help.