Closed rmcolq closed 5 years ago
hey @rmcolq I've been up all night debugging this, and it comes down to an issue with the Google nameservers. Specifically, I had one computer that worked and one that didn't (on the same network!) The fix came down to adding the Google nameservers to my /etc/resolv.conf
so it looks something like this (with others at the end):
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 8.8.8.8
nameserver 8.8.4.4
...
Could you please try this? I just tested pulling your container and it seems ok when the DNS finds the server:
$ singularity pull shub://rmcolq/pandora:pandora
Progress |===================================| 100.0%
Done. Container is at: /home/vanessa/rmcolq-pandora-dev-pandora.simg
I don't have write access to the /etc/resolv.conf
on the LSF cluster, but when I tried it in my local VM using the internet connection/IP address that failed last time, I got:
Traceback (most recent call last):
File "/usr/local/libexec/singularity/python/pull.py", line 74, in <module>
main()
File "/usr/local/libexec/singularity/python/pull.py", line 66, in main
layerfile=LAYERFILE)
File "/usr/local/libexec/singularity/python/shub/main.py", line 77, in PULL
manifest = client.get_manifest()
File "/usr/local/libexec/singularity/python/shub/api.py", line 113, in get_manifest
response = self.get(base, return_response=True)
File "/usr/local/libexec/singularity/python/base.py", line 288, in get
return_response=return_response)
File "/usr/local/libexec/singularity/python/base.py", line 304, in submit_request
response = safe_urlopen(request)
File "/usr/local/libexec/singularity/python/base.py", line 152, in safe_urlopen
return opener.open(url, data=data)
File "/usr/lib/python2.7/urllib2.py", line 429, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 447, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1241, in https_open
context=self._context)
File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
which is the same time out message I had from the VM last time.
I can't even view singularity-hub from my own wifi network at the moment (and on Saturday I could), only over my neighbours wifi connection.
What is the content of your (local) /etc/resolv.conf? I had the exact same issue as you, and the fix was changing that file to include Google's nameservers. To answer your question, I'm not sure about why it changed from its previously working state, but I have a ticket open with Google DNS and hopefully they have some insight. It could also be that it takes up to 24 hours for the DNS changes to trickle down, so maybe if we are lucky it will start working again.
In the meantime, if you give me the full URI of the container that you need to pull, I can try finding a direct link from Google Storage for you, if that helps. Rest assured your containers are ok! :)
Thank you, that's very helpful. For my VM, that file contains:
nameserver 127.0.0.53
search Belkin`
And then I forced your changes at the top of that list.
Fingers crossed you are right and it all just fixes itself with the Google DNS changes. Also thank you for the direct links!!
Yes fingers crossed! Hopefully it's just propagation that is happening.
Since an hour or so I have SSLError when trying to pull containers from singularity hub. @vsoch Do you think this is related to the issue here or should I open a new issue?
It's the same issue. Please use Google nameservers 8.8.8.8 and 8.8.4.4 as a temporary fix, and see this thread for more details and updates.
hey @rmcolq and @bouthilx ! I'm (for the first time) getting the correct server (35.197.63.182) when I've removed the Google nameservers, would you be able to test this on your end and let me know if you do as well?
I confirm that all problems have disappeared yesterday. I can login on the website, and I can pull containers from the hub. It is so nice to have it back! I never imagined how painful loosing it would be.
Woohoo! Glad to hear. It had to do (I think) with an A record using old authoritative servers (cached somewhere?). @rmcolq let us know if things look ok from your side of the world, and then I'll close the issue and send the good news to the list.
Also fixed for me!!
Woohoo! Closing issue.
Link to Container Collection Log, Build, or Collection (in that order)
https://www.singularity-hub.org/collections/1285 https://www.singularity-hub.org/collections/1297
Behavior when Building Locally
Error on Singularity Hub
Write here.
What do you think is going on?
I have been running nextflow pipelines with some processes running in singularity images. These images have been downloaded each time I have created a clean nextflow work directory. This has been working fine, but over the last day, I have been struggling to download these images. I have had some success in a local VM after I changed my IP address, but none on the main LSF cluster that I use. Do you block IP addresses if too many images have been downloaded?