ioos / ioosngdac

IOOS National Glider Data Assembly Center (V2)
https://ioos.github.io/ioosngdac/
8 stars 18 forks source link

Include NCEI accession numbers for archived data sets #163

Closed kerfoot closed 1 year ago

kerfoot commented 1 year ago

Tasks for adding NCEI accession numbers to archived data sets:

  1. Search NCEI geoportal for data set accession numbers. Not sure if the current API provides a reliable mechanism to retrieve accession numbers.
  2. Harvest accession numbers and map them to Glider DAC data set metadata reports.
  3. Include the accession numbers as an NC_GLOBAL attribute
leilabbb commented 1 year ago

This is how I search for a datasets on NCEI: If you type in the NCEI search box "glider data assembly center" AND "ru30" AND "Rutgers" AND "2018-07-05" you get more precise search about the dataset you are looking for. (The date is from the filename)

Here is the NCEI search page I have used: https://www.ncei.noaa.gov/metadata/geoportal/

kerfoot commented 1 year ago

Followed @leilabbb example and this approach seems to get us to the results we're looking for. I'm decomposing the search url to write a template that can be used to create the search url based on any number of search parameters. Will update this issue once I've got it figured out.

kerfoot commented 1 year ago

I elaborated on @leilabbb example for searching for accession numbers and wrote a script that takes one or more ERDDAP data set ids, breaks them into the glider name and the date, creates the search query and sends the query to geoportal. The results are attached as a csv file. There are a number of columns in the csv, but the ones of interest are:

dataset_id accession_number html

If you open the csv in excel and copy and paste the url in the 'html' column, it takes you to the NCEI accession. It seems to work very well. I am now able to map an ERDDAP data set id to an accession number, provided the data set has been archived by NCEI.

The script allows you to search either by one or more data set ids or a glider and retrieve the results.

I still need to clean the script up but, once I have, I will add it to my glider data utilities repo so that users are able to perform the searches for themselves.

The only potential issue may arise when a data set has been archived as both a real-time and delayed-mode data set. For example, look at these 2 dataset ids:

ru28-20190926T1413 ru28-20190926T1413-delayed

If both data sets have been archived, I don't know if there is a way to search for either/both the real-time and delayed-mode data set. I need to identify a data set for which this is the case and see what happens if I want to find the accession number for the delayed-mode data set. I will let you know once I have an understanding of what happens.

But, good progress! Now that I can find these data sets, I will be able to add the accession number (where it exists) to the data set status for the new status page I'm finishing up.

ru28_accession_records.csv

mdgrossi commented 1 year ago

If both data sets have been archived, I don't know if there is a way to search for either/both the real-time and delayed-mode data set. I need to identify a data set for which this is the case and see what happens if I want to find the accession number for the delayed-mode data set.

Both real time and delayed mode data are archived together within the same accession. So, for a given accession number NNNNNNN, the landing page will take you to whatever data NCEI has - real time, delayed mode, or both: https://www.ncei.noaa.gov/archive/accession/NNNNNNN

mdgrossi commented 1 year ago

FYI: The pagination on the new IOOS Glider DAC Accession Records pages do not seem to be working properly. The "Show All" buttons do, however.

kerfoot commented 1 year ago

Related to 172

kerfoot commented 1 year ago

NCEI accession pages release on October 27, 2023. Closing