spedas / pyspedas

Python-based Space Physics Environment Data Analysis Software
https://pyspedas.readthedocs.io/
MIT License
154 stars 60 forks source link

akebono data has possibly moved? #987

Closed jameswilburlewis closed 1 month ago

jameswilburlewis commented 2 months ago

Our akebono tests are failing because they can no longer download data. We were getting it from http://darts.isas.jaxa.jp/stp/data/exosd/ , but that URL now attempts (unsuccessully!) to redirect to the home page.

I can see some akebono data here:

https://data.darts.isas.jaxa.jp/pub/akebono/

but it only appears to have data for the pws instrument, and not rdm or orb which we previously had access to.

There is a notice on their front page https://darts.isas.jaxa.jp/ :

August 2024

Due to changes to the website configuration and maintenance of the data publishing path, paths will change and some apps will be unavailable. We apologize for the inconvenience. [Maintenance] [Period] 2024-08-20 12:00 -- 2024-08-23 12:00 (JST)

So perhaps things are still being moved. The akebono tests are disabled for now.

If this situation persists, it would be good to get in touch and see if rdm and orb data will still be available.

jameswilburlewis commented 2 months ago

There is Akebono PWS data available at https://data.darts.isas.jaxa.jp/pub/akebono/pws , but no orbit or RDM data yet that I can see.

The DARTS admins pointed me at a search interface here: https://www.darts.isas.jaxa.jp/stp/akebono/data.html , but I don't see a way to access individual files, only zip archives of data meeting the search criteria. I guess we can work with it if we have to, but hopefully there are still URLs to the individual files.

jameswilburlewis commented 1 month ago

I think this might be where the original data products have moved to:

https://darts.isas.jaxa.jp/app/stp/data/exosd/

RDM data at this location appears to be working as before. Orbit data is producing errors:

  File "/Users/jwl/PycharmProjects/pyspedas/venv/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "parsers.pyx", line 579, in pandas._libs.parsers.TextReader.__cinit__
  File "parsers.pyx", line 668, in pandas._libs.parsers.TextReader._get_header
  File "parsers.pyx", line 879, in pandas._libs.parsers.TextReader._tokenize_rows
  File "parsers.pyx", line 890, in pandas._libs.parsers.TextReader._check_tokenize_status
  File "parsers.pyx", line 2050, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

The issue seems to be that the server is delivering text files, with a .txt suffix, .but they are gzip encoded and need a gunzip step in order to be loaded. (In a browser, this seems to happen automatically. I wonder if there's an option we can pass to the requests library via spd_download() to take care of it?)

jameswilburlewis commented 1 month ago

Kludgey solution for orbit and rdm processing:

Add a gz flag to load_csv_file, if True, specify compression='gzip' when reading the CSV with pandas

In orb_postprocessing and rdm_postprocessing:

    try:
        data = load_csv_file(files, cols=cols)
    except UnicodeDecodeError:
        data = load_csv_file(files, cols=cols, gz=True)

I suppose this could fail if there were a mixture of old (uncompressed) and new (gzip compressed) files in the data directory.

Non-kludgey solutions might be to try to detect if a file is gzip-compressed before passing it to pandas. Or keep track of the encoding/compression in spd_download and uncompress if necessary, or negotiate with the server to deliver uncompressed data...?