singularityhub / singularityhub.github.io

Container tools for scientific computing! Docs at https://singularityhub.github.io/singularityhub-docs
https://singularityhub.github.io
68 stars 9 forks source link

Timout when downloading singularity image to some servers #159

Closed rmcolq closed 5 years ago

rmcolq commented 5 years ago

Link to Container Collection Log, Build, or Collection (in that order)

https://www.singularity-hub.org/collections/1285 https://www.singularity-hub.org/collections/1297

Behavior when Building Locally

Traceback (most recent call last):
  File "/usr/lib64/python3.4/urllib/request.py", line 1183, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "/usr/lib64/python3.4/http/client.py", line 1137, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib64/python3.4/http/client.py", line 1182, in _send_request
    self.endheaders(body)
  File "/usr/lib64/python3.4/http/client.py", line 1133, in endheaders
    self._send_output(message_body)
  File "/usr/lib64/python3.4/http/client.py", line 963, in _send_output
    self.send(msg)
  File "/usr/lib64/python3.4/http/client.py", line 898, in send
    self.connect()
  File "/usr/lib64/python3.4/http/client.py", line 1279, in connect
    super().connect()
  File "/usr/lib64/python3.4/http/client.py", line 871, in connect
    self.timeout, self.source_address)
  File "/usr/lib64/python3.4/socket.py", line 516, in create_connection
    raise err
  File "/usr/lib64/python3.4/socket.py", line 507, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/libexec/singularity/python/pull.py", line 82, in <module>
    main()
  File "/usr/libexec/singularity/python/pull.py", line 74, in main
    layerfile=LAYERFILE)
  File "/usr/libexec/singularity/python/shub/main.py", line 77, in PULL
    manifest = client.get_manifest()
  File "/usr/libexec/singularity/python/shub/api.py", line 121, in get_manifest
    response = self.get(base, return_response=True)
  File "/usr/libexec/singularity/python/base.py", line 291, in get
    updating_token=updating_token)
  File "/usr/libexec/singularity/python/base.py", line 310, in submit_request
    response = safe_urlopen(request)
  File "/usr/libexec/singularity/python/base.py", line 152, in safe_urlopen
    return opener.open(url, data=data)
  File "/usr/lib64/python3.4/urllib/request.py", line 464, in open
    response = self._open(req, data)
  File "/usr/lib64/python3.4/urllib/request.py", line 482, in _open
    '_open', req)
  File "/usr/lib64/python3.4/urllib/request.py", line 442, in _call_chain
    result = func(*args)
  File "/usr/lib64/python3.4/urllib/request.py", line 1226, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/usr/lib64/python3.4/urllib/request.py", line 1185, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 110] Connection timed out>

Error on Singularity Hub

Write here.

What do you think is going on?

I have been running nextflow pipelines with some processes running in singularity images. These images have been downloaded each time I have created a clean nextflow work directory. This has been working fine, but over the last day, I have been struggling to download these images. I have had some success in a local VM after I changed my IP address, but none on the main LSF cluster that I use. Do you block IP addresses if too many images have been downloaded?

vsoch commented 5 years ago

hey @rmcolq I've been up all night debugging this, and it comes down to an issue with the Google nameservers. Specifically, I had one computer that worked and one that didn't (on the same network!) The fix came down to adding the Google nameservers to my /etc/resolv.conf so it looks something like this (with others at the end):

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 8.8.8.8
nameserver 8.8.4.4
...

Could you please try this? I just tested pulling your container and it seems ok when the DNS finds the server:

$ singularity pull shub://rmcolq/pandora:pandora
Progress |===================================| 100.0% 
Done. Container is at: /home/vanessa/rmcolq-pandora-dev-pandora.simg
rmcolq commented 5 years ago

I don't have write access to the /etc/resolv.conf on the LSF cluster, but when I tried it in my local VM using the internet connection/IP address that failed last time, I got:

Traceback (most recent call last):
  File "/usr/local/libexec/singularity/python/pull.py", line 74, in <module>
    main()
  File "/usr/local/libexec/singularity/python/pull.py", line 66, in main
    layerfile=LAYERFILE)
  File "/usr/local/libexec/singularity/python/shub/main.py", line 77, in PULL
    manifest = client.get_manifest()
  File "/usr/local/libexec/singularity/python/shub/api.py", line 113, in get_manifest
    response = self.get(base, return_response=True)
  File "/usr/local/libexec/singularity/python/base.py", line 288, in get
    return_response=return_response)
  File "/usr/local/libexec/singularity/python/base.py", line 304, in submit_request
    response = safe_urlopen(request)
  File "/usr/local/libexec/singularity/python/base.py", line 152, in safe_urlopen
    return opener.open(url, data=data)
  File "/usr/lib/python2.7/urllib2.py", line 429, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 447, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1241, in https_open
    context=self._context)
  File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>

which is the same time out message I had from the VM last time.

I can't even view singularity-hub from my own wifi network at the moment (and on Saturday I could), only over my neighbours wifi connection.

vsoch commented 5 years ago

What is the content of your (local) /etc/resolv.conf? I had the exact same issue as you, and the fix was changing that file to include Google's nameservers. To answer your question, I'm not sure about why it changed from its previously working state, but I have a ticket open with Google DNS and hopefully they have some insight. It could also be that it takes up to 24 hours for the DNS changes to trickle down, so maybe if we are lucky it will start working again.

vsoch commented 5 years ago

In the meantime, if you give me the full URI of the container that you need to pull, I can try finding a direct link from Google Storage for you, if that helps. Rest assured your containers are ok! :)

vsoch commented 5 years ago

Here is the pandora image, and here are the Singularity_Recipes containers (direct links):

rmcolq commented 5 years ago

Thank you, that's very helpful. For my VM, that file contains:

nameserver 127.0.0.53
search Belkin`

And then I forced your changes at the top of that list.

Fingers crossed you are right and it all just fixes itself with the Google DNS changes. Also thank you for the direct links!!

vsoch commented 5 years ago

Yes fingers crossed! Hopefully it's just propagation that is happening.

bouthilx commented 5 years ago

Since an hour or so I have SSLError when trying to pull containers from singularity hub. @vsoch Do you think this is related to the issue here or should I open a new issue?

vsoch commented 5 years ago

It's the same issue. Please use Google nameservers 8.8.8.8 and 8.8.4.4 as a temporary fix, and see this thread for more details and updates.

vsoch commented 5 years ago

hey @rmcolq and @bouthilx ! I'm (for the first time) getting the correct server (35.197.63.182) when I've removed the Google nameservers, would you be able to test this on your end and let me know if you do as well?

bouthilx commented 5 years ago

I confirm that all problems have disappeared yesterday. I can login on the website, and I can pull containers from the hub. It is so nice to have it back! I never imagined how painful loosing it would be.

vsoch commented 5 years ago

Woohoo! Glad to hear. It had to do (I think) with an A record using old authoritative servers (cached somewhere?). @rmcolq let us know if things look ok from your side of the world, and then I'll close the issue and send the good news to the list.

rmcolq commented 5 years ago

Also fixed for me!!

vsoch commented 5 years ago

Woohoo! Closing issue.