skyfielders / python-skyfield

Elegant astronomy for Python
MIT License
1.41k stars 212 forks source link

Request URL change in hipparcos.py #788

Closed aendie closed 2 years ago

aendie commented 2 years ago

Downloading hip_main.dat fails as follows in my sfalmanac code:

Traceback (most recent call last):
  File "/home/hello/Experiment 9.1.2022/SFalmanac-Py3-master/sfalmanac.py", line 577, in <module>
    ts = init_sf(spad)      # in alma_skyfield (almanac-based)
  File "/home/hello/Experiment 9.1.2022/SFalmanac-Py3-master/alma_skyfield.py", line 277, in init_sf
    with load.open(hipparcos.URL) as f:
  File "/home/hello/.local/lib/python3.9/site-packages/skyfield/iokit.py", line 329, in open
    path = self._assure(url, filename, reload, backup)
  File "/home/hello/.local/lib/python3.9/site-packages/skyfield/iokit.py", line 214, in _assure
    download(url, path, self.verbose, backup=backup)
  File "/home/hello/.local/lib/python3.9/site-packages/skyfield/iokit.py", line 525, in download
    raise e2
OSError: cannot download https://cdsarc.u-strasbg.fr/ftp/cats/I/239/hip_main.dat because <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1123)>

A user informed me that the domain has changed. He wrote:

I found the error! the domain "astro.unistra.fr" as well as the domain "u-strasbg.fr" are obsoleted by "cds.unistra.fr". Please can you try the URL https://cdsarc.cds.unistra.fr/... ?

I confirm that the following link shows data - presumably what we want: https://cdsarc.cds.unistra.fr/ftp/cats/I/239/hip_main.dat

I request that Skyfield is updated to use this new URL. (My guess is that it is line 8 in hipparcos.py) Kind Regards !!!!!

Bernmeister commented 2 years ago

@brandon-rhodes Perhaps it's time for revisiting Skyfield downloads all or nothing.

aendie commented 2 years ago

Hmm... I'm not sure what the best long-term solution is (your arguments are valid), however I would appreciate a 'quick fix' for now by switching line 8 in hipparcos.py to: URL = 'https://cdsarc.cds.unistra.fr/ftp/cats/I/239/hip_main.dat'

I did this as a quick patch to C:\Python310\Lib\site-packages\skyfield\data\hipparcos.py and Skyfield downloaded a new hip_main.dat that had identical contents to the old one (checking every byte using the excellent 'Compare' plugin for Notepad++).

Failing this I could add my own download-bypass code in sfalmanac just as I have done to bypass downloading finals2000A.all which defaults in sequence to the following locations: ftp://ftp.iers.org/products/eop/rapid/standard/ https://maia.usno.navy.mil/ser7/ https://datacenter.iers.org/data/9/ ... the first being the location that is stored within Skyfield (and should be the official server once it's fixed LOL); the second is the official source as documented currently by IERS; the third is for the countries that cannot access the usno.navy.mil domain.

I am hoping for a 'quick fix' as I remembered was done long ago (see https://github.com/skyfielders/python-skyfield/issues/301). Let me know if I should opt for my own download-bypass logic. Thanks for considering this request :-)

brandon-rhodes commented 2 years ago

I think that the quick fix does make sense. Can we find an official announcement of the change anywhere? It would be sad to release a new Skyfield that uses cdsarc.cds.unistra.fr and then have it soon break because that was a temporary hostname that we had mistaken for a more permanent one.

It's interesting that cdsarc.u-strasbg.fr does still work, but its certificate is no longer recognized as valid—I can fetch the file with curl from the old URL only if I add the flag --no-check-certificate, because otherwise I get the error:

$ wget https://cdsarc.u-strasbg.fr/ftp/cats/I/239/hip_main.dat
--2022-09-04 15:16:17--  https://cdsarc.u-strasbg.fr/ftp/cats/I/239/hip_main.dat
Resolving cdsarc.u-strasbg.fr (cdsarc.u-strasbg.fr)... 130.79.128.5
Connecting to cdsarc.u-strasbg.fr (cdsarc.u-strasbg.fr)|130.79.128.5|:443... connected.
ERROR: cannot verify cdsarc.u-strasbg.fr's certificate, issued by `CN=cdsarc.u-strasbg.fr,OU=CDS,O=ObAS,L=Strasbourg,ST=France,C=Fr':
  Self-signed certificate encountered.
To connect to cdsarc.u-strasbg.fr insecurely, use `--no-check-certificate'.

The site comes up just fine in my browser if I remove the s from https:, so apparently they still serve HTTP just fine.

brandon-rhodes commented 2 years ago

Oh—and: if you are distributing a tool that needs certain data files, I definitely recommend distributing with it all the data files it needs. Tools that need to pause and start downloading files the first time they run create all kinds of problems (like, what if the person is offline). If you for whatever reason can't distribute them with data files, then fallback arrangements will indeed increase the number of months or years that pass before the tool breaks (which will happen when all the fallback URLs have finally gone away).

I don't suppose that the Hipparcos data set is hosted somewhere that is good at keeping URLs working, like archive.org?

brandon-rhodes commented 2 years ago

An update:

The situation has been clarified through private correspondence. The statement quoted above:

I found the error! the domain "astro.unistra.fr" as well as the domain "u-strasbg.fr" are obsoleted by "cds.unistra.fr"

—was not, in fact, written by an SFalmanac/Skyalmanac user themselves, but was written in an email reply to that user from a Research Engineer at the Strasbourg Astronomical Data Center—someone whose email address in fact ends in @astro.unistra.fr and thus ought to be qualified to pronounce upon which domains are still supported.

Now that we know that someone from that organization was indeed consulted on the question of the URL, I'll go make the change in the source.