spack / spack

A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
https://spack.io
Other
4.29k stars 2.27k forks source link

Periodic failure of downloads from ftpmirrors.gnu.org #43032

Closed G-Ragghianti closed 8 months ago

G-Ragghianti commented 8 months ago

Steps to reproduce

I'm getting failures downloading source archives from ftpmirror.gnu.org:

==> Installing autoconf-2.72-d3mf3xfr5o6mfzj7rbr723luxrqclujg [41/56]
==> No binary for autoconf-2.72-d3mf3xfr5o6mfzj7rbr723luxrqclujg found: installing from source
==> Fetching https://ftpmirror.gnu.org/autoconf/autoconf-2.72.tar.gz
==> Warning: The contents of /tmp/gragghia/spack-stage/spack-stage-autoconf-2.72-d3mf3xfr5o6mfzj7rbr723luxrqclujg/autoconf-2.72.tar.gz look like HTML.  Either the URL you are trying to use does not exist or you have an internet gateway issue.  You can remove the bad archive using 'spack clean <package>', then try again using the correct URL.
==> Error: ChecksumError: sha256 checksum failed for /tmp/gragghia/spack-stage/spack-stage-autoconf-2.72-d3mf3xfr5o6mfzj7rbr723luxrqclujg/autoconf-2.72.tar.gz
    Expected afb181a76e1ee72832f6581c0eddf8df032b83e2e0239ef79ebedc4467d92d6e but got 31010aaa42bff1b455423dca60ebcaa6eefd26a241011566f580472394df6ec6. File size = 5332 bytes. Contents = b"\x1f\x8b\x08\x00\x00\x00\x00\x00\x04\x03\xed\\ys\xd3\xc8...'\xa2\x16T>\xe7\xff\x05\x0f/\x04\xfe\xe0Y\x00\x00"

The downloaded file appears to be a gzip'ed html document from "Hoobly". I'm guessing that this is one of the mirrors in the DNS round-robin for ftpmirrors.gnu.org. There is no indication as to why it is sending the HTML instead of the requested source files. I realize that this isn't really a fault of spack, but since gnu.org hosts so many packages, this is causing a large problem for installing anything with spack.

Is there anything that we (as spack maintainers/users) do about this?

Error message

No response

Information on your system

General information

G-Ragghianti commented 8 months ago

I have sent an inquiry to webmasters@gnu.org.

dvickernasa commented 8 months ago

I'm running into a similar issue a patch from llvm (a dependency for julia). This is coming from github, not gnu.org.

==> Adding package llvm@13.0.1 to mirror
==> Fetching https://github.com/JuliaLang/llvm-project/compare/75e33f71c2dae584b13a7d1186ae0a038ba98838...2f4460bd46aa80d4fe0d80c3dabcb10379e8d61b.patch
######################################################################################################## 100.0%
==> Warning: Error while fetching llvm@13.0.1
  sha256 checksum failed for /local/dvicker-spackalicious/tmp-spack/spack-stage-patch-d9e7f0befeddddcba40eaed3895c4f4734980432b156c39d7a251bc44abb13ca/75e33f71c2dae584b13a7d1186ae0a038ba98838...2f4460bd46aa80d4fe0d80c3dabcb10379e8d61b.patch
G-Ragghianti commented 8 months ago

Have you checked the nature of the returned file? I think your problem is different that mine since it is a different server.

dvickernasa commented 8 months ago

The file doesn't even exist. The directory its supposed to be in does, but the file is just missing.

[dvicker@twgregoi Linux-x86_64]$ ls /local/dvicker-spackalicious/tmp-spack/spack-stage-patch-d9e7f0befeddddcba40eaed3895c4f4734980432b156c39d7a251bc44abb13ca
total 0
[dvicker@twgregoi Linux-x86_64]$ 

So, yes, a very different problem than yours. Sorry, for the noise on this issue.

haampie commented 8 months ago

We're hosting the (larger) julia patches now at https://github.com/spack/patches. Sometimes their hash changes because the contents of those patches is generated on request, and depends on the git version on the server etc.

G-Ragghianti commented 8 months ago

I have made some progress: If I had to guess, I would say that the hoobly.com gnu mirror may be rate limiting and returns the wrong content when you reach a threshold. I don't think the HTML file makes any mention of why it is returning this though.

I have also created a test script that may allow you to reproduce the error. In the script, I have a number of gnu mirror URLs that explicitly try to use hoobly.com. It also sets the user-agent to the spack agent string. I think that this may be triggering the problem due to the user-agent string containing "bot". Once I saw that my requests were being corrupted, I changed the user-agent to just "spack" and hoobly.com began returning the correct files.

G-Ragghianti commented 8 months ago

To be clear: I've determined that the gnu mirror at hoobly.com is rate limiting the spack client based on the user-agent string containing "bot" in it.

haampie commented 8 months ago

From hoobly dot com / robots.txt it does not look like it intends to block us.

Actually the website looks like a scam. (removed the link so people don't click on it)

haampie commented 8 months ago

Can confirm:

curl -A 'bot' -LfsS 'https://gnu.mirrors.hoobly.com/autoconf/autoconf-2.72.tar.gz' # html filled with ads

vs

curl -A 'foo' -LfsS 'https://gnu.mirrors.hoobly.com/autoconf/autoconf-2.72.tar.gz' #  the file

I've also emailed the GNU "webmasters" that hoobly dot com seems broken.

haampie commented 8 months ago

impact-low because in principle Spack also mirror the tarball on mirrors.spack.io.

haampie commented 8 months ago

They have removed the mirror

G-Ragghianti commented 8 months ago

Any idea why the default spack source mirror wasn't providing these packages for me? They were very common packages that I would expect to have been in the mirror.