spedas / pyspedas

Python-based Space Physics Environment Data Analysis Software
https://pyspedas.readthedocs.io/
MIT License
149 stars 58 forks source link

MAVEN l2_regex fails to match for some data types #891

Open jameswilburlewis opened 3 months ago

jameswilburlewis commented 3 months ago

This is happening in get_year_month_day_from_sci_file from maven/download_file_utilities.py for the rse, iuv, and ngi datatypes.

21-Jun-24 15:31:45: l2_regex match failed for filename mvn_rse_l2_w60_20160101T000000_v01_r00.tab
21-Jun-24 15:32:06: l2_regex match failed for filename mvn_iuv_l2_corona-orbit02450-fuv_20160102T201901.xml
21-Jun-24 15:32:35: l2_regex match failed for filename mvn_ngi_l2_ion-abund-18402_20160102T222549_v08_r01.csv

Here's an example of a successfully matched filename: mvn_mag_l2_2016002ss1s_20160102_v01_r01.xml

Apparently the rse, iuv, and ngi filenames don't follow the expected pattern:

    l2_pattern = (
        r"^mvn_(?P<{0}>[a-zA-Z0-9]+)_"
        r"(?P<{1}>l[a-zA-Z0-9]+)"
        r"(?P<{2}>|_[a-zA-Z0-9\-]+)_"
        r"(?P<{3}>[0-9]{{4}})"
        r"(?P<{4}>[0-9]{{2}})"
        r"(?P<{5}>[0-9]{{2}})"
        r"(?P<{6}>|T[0-9]{{6}}|t[0-9]{{6}})_"
        r"v(?P<{7}>[0-9]+)_"
        r"r(?P<{8}>[0-9]+)\."
        r"(?P<{9}>cdf|xml|sts|md5)"
        r"(?P<{10}>\.gz)*"
    ).format(
        "instrument",
        "level",
        "description",
        "year",
        "month",
        "day",
        "time",
        "version",
        "revision",
        "extension",
        "gz",
    )
jameswilburlewis commented 3 months ago

I see that the rse datatype is using a ".tab" file extension that's not supported by the l2_regex -- I've seen that extension in the kp files, though, so maybe this is really kp data?

And ngi seems to have a ".csv" extension, which doesn't appear in the l2_regex.

iuv might be missing a version number?

nickssl commented 3 months ago

There are more issues with the rse files, not just the regex. These are .tab files (TAB delimited text) that currently cannot be handled by the rest of the code and cannot be loaded into tplot. I attach an example of such a file. Since they cannot be loaded, I added code to skip these files.

An additional problem is that the rest of the maven code assumes that any .tab files are kp files, which is not true in this case. For loading .tab files, the code assumes a particular structure inside the .tab file, which is not valid for rse files. So, if we want to load these rse files in the future, a more extensive fix for the existing code will be needed.

mvn_rse_l2_w40_20160101T000000_v01_r00.tab.zip