ubsuny / CP1-24-HW5

Homework 5 template for CP1
1 stars 10 forks source link

Data Collection Parameters #31

Open avgagliardo opened 2 hours ago

avgagliardo commented 2 hours ago

Starting a thread here so that we can set some specifics about the data scrape.

1) Monthly Averages or Discrete Measurements

First off, we have the option of using data from the monthly averages or discrete measurements.

I would suggest we go with the monthly averages since the data is more consistent, but if we want a higher resolution in the data then using the set of discrete measurements would be better-- and of course we could always both.

2) Number and Distribution of Sites

Right now I have all of the site data (for methane), so we have a large degree of freedom over which sites/datasets we select. We need to follow the task prompt, but beyond that I am open to input regarding which sites to use.

3) Metadata

If there is any additional metadata that would be useful downstream, let me know and I'll include it.

Right now each testing site has the following metadata included:

    {
        "Category": "Greenhouse Gases",
        "Frequency": "Monthly Averages",
        "Readme Link": "https://gml.noaa.gov//aftp/data/trace_gases/ch4/flask/surface/README_ch4_surface-flask_ccgg.html",
        "Elevation": "185.0",
        "Site Code": "ALT",
        "Latitude": "82.451",
        "Country": "Canada",
        "Data Link": "https://gml.noaa.gov//aftp/data/trace_gases/ch4/flask/surface/txt/ch4_alt_surface-flask_1_ccgg_month.txt",
        "Description": "Air samples collected in glass flasks.",
        "Site URL": "https://gml.noaa.gov//dv/site/index.php?stacode=ALT",
        "Location": "Alert, Nunavut, Canada",
        "Longitude": "-62.507",
        "Type": "Flask",
        "Name": "Methane(CH4)",
        "Year": "Multiple",
        "Timezone": "-5 hours"
    }

Before I ship the data, I can merge the metadata into each site's dataframe or I can provide it in a separate file. Whichever is more convenient for the data prep, just let me know.

If we don't need any of this, then I can just leave it behind!

avgagliardo commented 2 hours ago

@kylemasc917 We should probably sync up our datasets with respect to 2)