hapi-server / server-nodejs

General-use HAPI server front-end implemented in node.js.
MIT License
2 stars 2 forks source link

ESA/ESAC/CSA catalog and info metadata #16

Closed rweigel closed 1 year ago

rweigel commented 3 years ago

Create a very basic pass-through server for data served through CAIO. For now, only build the code to convert their metadata to HAPI metadata and don't write the code to convert their data response to a HAPI data response.

The /catalog response can be built using the first/last columns of

curl 'https://csa.esac.esa.int/csa/aio/metadata-action?SELECTED_FIELDS=DATASET.DATASET_ID,DATASET.START_DATE,DATASET.END_DATE,DATASET.TITLE&RESOURCE_CLASS=DATASET&RETURN_TYPE=CSV'

For each row, an /info JSON response file must be created. Each file will have startDate, stopDate, and id taken from DATASET.START_DATE, DATASET.END_DATE, and DATASET.DATASET_ID in the CSV response to the above request.

To finish populating the /info JSON, for each x=DATASET.DATASET_ID in the CSV response, request

curl 'https://csa.esac.esa.int/csa/aio/product-action?RETRIEVALTYPE=HEADER&DATASET_ID=x' > tmp/x.tar.gz

For example, for x=C2_CP_PEA_PITCH_3DXPALARL_DEFlux the file C2_CP_PEA_PITCH_3DXPALARL_DEFlux.tar.gz has files

CSA_Dataset_Metadata_20201010_1940/C2_CP_PEA_PITCH_3DXPALARL_DEFlux.CEF.XML
CSA_Download_20201010_1940/C2_CQ_PEA_CAVEATS/C2_CQ_PEA_CAVEATS__.*.cef

Use the contents of the XML file to populate the rest of each /info JSON response as given below. Save each file as meta/x.json.

You code should look to see if tmp/C2_CP_PEA_PITCH_3DXPALARL_DEFlux directory already exists. If it does, don't re-download metadata. Just read existing metadata. Script should have a flag called update to control this behavior.

{
    "startDate": "DATASET.START_DATE from CSV",
    "stopDate": "DATASET.END_DATE from CSV",
    "cadence": "TIME_RESOLUTION from XML",
    "description": "DATASET_DESCRIPTION from XML" + ";" + "DATASET_DESCRIPTION from XML",
    "resourceURL": "https://csa.esac.esa.int/csa/aio/product-action?RETRIEVALTYPE=HEADER&DATASET_ID=x",
    "contact": "CONTACT_COORDINATES from XML"
    "x_original_metadata": "Use XML2JSON on XML file and insert resulting JSON here",
    "parameters": [
        {
            "name": "PARAMETER_ID from XML",
            "description": "FIELDNAME from XML"
            "units": "UNITS from XML",
            "fill": "FILLVAL from XML",
            "type": "VALUE_TYPE from XML"
        },...
rweigel commented 3 years ago

Notes from telecon

wget --content-disposition 'https://csa.esac.esa.int/csa/aio/streaming-action?DATASET_ID=C2_CP_PEA_3DXPH_DEFlux&START_DATE=2006-02-18T00:00:00Z&END_DATE=2006-02-20T23:00:00Z&NON_BROWSER&CSACOOKIE=...'

wget --content-disposition 'https://csa.esac.esa.int/csa/aio/streaming-action?DATASET_ID=C1_CP_FGM_5VPS&START_DATE=2004-06-18T00:00:00Z&END_DATE=2004-06-19T00:00:00Z&NON_BROWSER&CSACOOKIE=...'

https://csa.esac.esa.int/csa/aio/html/streamingrequests.shtml
curl "https://csa.esac.esa.int/csa/aio/metadata-action?SELECTED_FIELDS=DATASET.DATASET_ID,DATASET.START_DATE,DATASET.END_DATE,DATASET.TITLE&RESOURCE_CLASS=TASET&RETURN_TYPE=JSON&QUERY=(DATASET.IS_CEF='true')"
rweigel commented 3 years ago

Caveats are datasets. To get them, use

curl 'https://csa.esac.esa.int/csa/aio/metadata-action?RESOURCE_CLASS=REFERENCED_DATASET&SELECTED_FIELDS=REFERENCED_DATASET&RETURN_TYPE=CSV'

4th column is parent, 2nd is caveat dataset id.

For now, just serve science datasets:

curl 'https://csa.esac.esa.int/csa/aio/metadata-action?SELECTED_FIELDS=DATASET.DATASET_ID,DATASET.START_DATE,DATASET.END_DATE,DATASET.TITLE&RESOURCE_CLASS=DATASET&QUERY=(DATASET.IS_CEF=%27true%27%20and%20DATASET.MAIN_GROUP%20=%20%27Science%27)&RETURN_TYPE=JSON'

Dev HAPI server at http://hapi-server.org/servers-dev/#server=CAIO

rweigel commented 1 year ago

Jeremy created a server for this data and so this code is no longer needed.