ESA/ESAC/CSA catalog and info metadata

rweigel commented 4 years ago

Create a very basic pass-through server for data served through CAIO. For now, only build the code to convert their metadata to HAPI metadata and don't write the code to convert their data response to a HAPI data response.

The /catalog response can be built using the first/last columns of

curl 'https://csa.esac.esa.int/csa/aio/metadata-action?SELECTED_FIELDS=DATASET.DATASET_ID,DATASET.START_DATE,DATASET.END_DATE,DATASET.TITLE&RESOURCE_CLASS=DATASET&RETURN_TYPE=CSV'

For each row, an /info JSON response file must be created. Each file will have startDate, stopDate, and id taken from DATASET.START_DATE, DATASET.END_DATE, and DATASET.DATASET_ID in the CSV response to the above request.

To finish populating the /info JSON, for each x=DATASET.DATASET_ID in the CSV response, request

curl 'https://csa.esac.esa.int/csa/aio/product-action?RETRIEVALTYPE=HEADER&DATASET_ID=x' > tmp/x.tar.gz

For example, for x=C2_CP_PEA_PITCH_3DXPALARL_DEFlux the file C2_CP_PEA_PITCH_3DXPALARL_DEFlux.tar.gz has files

CSA_Dataset_Metadata_20201010_1940/C2_CP_PEA_PITCH_3DXPALARL_DEFlux.CEF.XML
CSA_Download_20201010_1940/C2_CQ_PEA_CAVEATS/C2_CQ_PEA_CAVEATS__.*.cef

Use the contents of the XML file to populate the rest of each /info JSON response as given below. Save each file as meta/x.json.

You code should look to see if tmp/C2_CP_PEA_PITCH_3DXPALARL_DEFlux directory already exists. If it does, don't re-download metadata. Just read existing metadata. Script should have a flag called update to control this behavior.

{
    "startDate": "DATASET.START_DATE from CSV",
    "stopDate": "DATASET.END_DATE from CSV",
    "cadence": "TIME_RESOLUTION from XML",
    "description": "DATASET_DESCRIPTION from XML" + ";" + "DATASET_DESCRIPTION from XML",
    "resourceURL": "https://csa.esac.esa.int/csa/aio/product-action?RETRIEVALTYPE=HEADER&DATASET_ID=x",
    "contact": "CONTACT_COORDINATES from XML"
    "x_original_metadata": "Use XML2JSON on XML file and insert resulting JSON here",
    "parameters": [
        {
            "name": "PARAMETER_ID from XML",
            "description": "FIELDNAME from XML"
            "units": "UNITS from XML",
            "fill": "FILLVAL from XML",
            "type": "VALUE_TYPE from XML"
        },...

rweigel commented 3 years ago

Notes from telecon

wget --content-disposition 'https://csa.esac.esa.int/csa/aio/streaming-action?DATASET_ID=C2_CP_PEA_3DXPH_DEFlux&START_DATE=2006-02-18T00:00:00Z&END_DATE=2006-02-20T23:00:00Z&NON_BROWSER&CSACOOKIE=...'

wget --content-disposition 'https://csa.esac.esa.int/csa/aio/streaming-action?DATASET_ID=C1_CP_FGM_5VPS&START_DATE=2004-06-18T00:00:00Z&END_DATE=2004-06-19T00:00:00Z&NON_BROWSER&CSACOOKIE=...'

https://csa.esac.esa.int/csa/aio/html/streamingrequests.shtml

curl "https://csa.esac.esa.int/csa/aio/metadata-action?SELECTED_FIELDS=DATASET.DATASET_ID,DATASET.START_DATE,DATASET.END_DATE,DATASET.TITLE&RESOURCE_CLASS=TASET&RETURN_TYPE=JSON&QUERY=(DATASET.IS_CEF='true')"

rweigel commented 3 years ago

Wait to hear from Beatriz on when metadata server is ready
They will provide an option to download headerless CSV
Provide explanation of difference between their holdings and CDAWeb (CDAWeb may have key params earlier, but CAIO has everything, including 'non-key' datasets, so CAIO's holdings are complete.) Give link to lists from each site.
Fix Python client to allow missing length in time parameter issue created
Replace \n w/
in HTML representation of description (server-ui) (done)
Will need to write code that figures out the actual length of the time parameter. Metadata has SIGNIFICANT_FIGURES for time parameters, but what is emitted by server does not always match. For example, the TS05 dataset output has a length of 27, but SIGNIFICANT_FIGURES=24.

Caveats are datasets. To get them, use

curl 'https://csa.esac.esa.int/csa/aio/metadata-action?RESOURCE_CLASS=REFERENCED_DATASET&SELECTED_FIELDS=REFERENCED_DATASET&RETURN_TYPE=CSV'

4th column is parent, 2nd is caveat dataset id.

For now, just serve science datasets:

curl 'https://csa.esac.esa.int/csa/aio/metadata-action?SELECTED_FIELDS=DATASET.DATASET_ID,DATASET.START_DATE,DATASET.END_DATE,DATASET.TITLE&RESOURCE_CLASS=DATASET&QUERY=(DATASET.IS_CEF=%27true%27%20and%20DATASET.MAIN_GROUP%20=%20%27Science%27)&RETURN_TYPE=JSON'

Dev HAPI server at http://hapi-server.org/servers-dev/#server=CAIO

rweigel commented 1 year ago

Jeremy created a server for this data and so this code is no longer needed.

hapi-server / server-nodejs

ESA/ESAC/CSA catalog and info metadata #16