nickmckay / LiPD-utilities

Input/output and manipulation utilities for LiPD files in Matlab, R and Python
http://nickmckay.github.io/LiPD-utilities/
GNU General Public License v2.0
29 stars 9 forks source link

NOAA Updates #54

Closed chrismheiser closed 4 years ago

chrismheiser commented 4 years ago

List of issues to address

1. Study name on the first line is missing. This name should be the same as Study_Name in the Title section (see #5). Action : Python script has a function called __generate_study_name() that finds or creates a study name. If study name doesn't exist, it attempts to use ... If those keys don't exist, creating the study name fails. Need test files to see if they meet the requirements for creating the study name and/or why it may be failing in this function.

2. Online_Resource has the last part of URL repeated twice.
Action: Find why it's adding the URL twice.

3. The Online_Resource for LiPD files should be https://www1.ncdc.noaa.gov/pub/data/paleo/reconstructions/climate12k/temperature/version1.0.0/Temp12k_v1.0.0.LiPD. LiPD files will be in their own directory, separate from the NOAA Templates. Action : Correct the online resource link template

4. Need Contribution_Date. This can be the same as the Modified_Date. Action : Make the contribution date a timestamp of when the file is created.

5. Need Study_Name. If it is possible to programmatically generate a study name, it should generally follow: Where, When, What. We need to create this programmatically, maybe (geo_siteName + paleoData_minYear - paleoData_maxYear + pub1_title)? Might be weird sometimes, but something like that. Action: This already exists. Refer to Issue # 1. Might be a bug or files are missing the necessary data.

6. Investigators are sometimes missing, and other times not consistently formatted (eg, missing first initial). Maybe just always pull this from pub1_authors? Action: This already partially exists._ When investigators is empty, it creates the investigators field using the FIRST publication available with author data. Generally this is pub0. When the author entry is a list of authors, it will create the investigator string as "LastName; LastName;..." However, if the author data is a single string of multiple author names, it gets trickier. I'm not positive this case is working. Since sometimes investigators is missing completely, there may be a bug in this function.

7. Investigators should be split with semicolons instead of commas. Action: The function mentioned in issue # 6 does this when generating investigators. However, this does not cover existing investigator data. I'll make a function to check existing data and format it as necessary.

8. Descriptions are random (eg, “Ian Walker (he could not send the data)” or “cannot validate elevation”). What do you think about a boilerplate description related to Temperature 12k here instead? WDS-Paleo could draft the description. Action: Nick is handling this.

9. Some publications are missing. This should be fixed. Action: Bug. Find out why.

10. Site_Names are missing. Action: Check the mapping. Data may be getting lost.

11. Location is missing. The NASA GCMD location keywords (provided in Table S1) go in this field. Action: Nick is handling this.

12. Many files are missing variable “what” terms. The shortname could be used for the “what.” Action: Map the paleoData_variableName to "what"

13. Variables seasonality is missing. Action: Possible mapping issue? Nick - "This should come from interpretation1_seasonality"

14. Variables C or N designation is mostly missing Action : Autofill this based on a sample of the table column data.

15. Column headings in data table should be tab delimited (not space delimited). Action: Fixed. Removed fixed 'spaces' spacing.

16. Shortnames listed in Variables section do not always match data column headings. This seems like it is usually caused by repeated shortnames (eg, d18O in "893A.Kennet.2007-1.txt") Action: Need Lipd file to recreate the issue. Will investigate.

17. Data tables should not have # at the start of their lines. Action: This is an ongoing design change that has switched. Formerly, it was requested to have #, then no #, then # again. Can remove.

18. Many variables that are uncertainties are either missing units or have units designated as “unitless” when they are not unitless (eg, file “Wonderkrater.Scott.2016-2.txt”) Action: Nick is handling this. Data problem.

chrismheiser commented 4 years ago

Related change in lipd.net :

chrismheiser commented 4 years ago

Finished Issues: 1 , 5 , 6 , 7 , 10

chrismheiser commented 4 years ago

@nickmckay I'm not sure about this issue. The online resource URLs are input into the NOAA template directly from the user's arguments when calling the NOAA function. We used to have it hard-coded, but since the links change with different projects, we put it as an argument instead.

  1. The Online_Resource for LiPD files should be https://www1.ncdc.noaa.gov/pub/data/paleo/reconstructions/climate12k/temperature/version1.0.0/Temp12k_v1.0.0.LiPD. LiPD files will be in their own directory, separate from the NOAA Templates.
nickmckay commented 4 years ago

Yes, I think this one was my fault, on the input.

chrismheiser commented 4 years ago
  1. NOAA file output now stored in ../noaa_files directory, separate from LiPD files.
  2. If Contribution Date is empty, a current timestamp is inserted in the field.
  3. Variable 'what' filled with 'shortname' if blank
  4. Seasonality mapping changed a while back from climateInterpretation["seasonality"] to interpretation[0]["scope"]. Updated the mapping.
  5. Column Character (C) and Numeric (N) is set based on column values
  6. Data column headers are now tab-delimited
  7. Removed # from start of data table values lines.

Remaining Issues: 9, 16

chrismheiser commented 4 years ago

Updated to LiPD-0.2.7.6

Fixed issue 16, and made progress on issue 9.

Remaining: 9

chrismheiser commented 4 years ago

All issues are fixed. Files were run in batch, and sent to Nick. No issues so far.