hurlbertlab / core-transient

Data and code for NSF funded research on core vs transient species
7 stars 3 forks source link

dataset 99 source unknown #43

Closed ahhurlbert closed 8 years ago

ahhurlbert commented 9 years ago

Origin of this dataset is unknown. Data cleaning scripts reference http://gcmd.nasa.gov/records/GCMD_cmar_wh.html which refers to the CSIRO Marine Data Warehouse. A dataset called "csiro_warehouse" is available on OBIS with 106,513 records (metadata refers to ~106,000 records).

However, raw dataset in our repo (dataset_99.csv) has only 43,933 records with very different fields, although the extremely verbose SampleID field does reference the Marine Data Warehouse with records like "Courageous_survey_Cour031(1978)_station_no_10_extracted_from_CMAR_Data_Warehouse_on_12_Oct_2005_42.5_41_40_45". The raw dataset in our repo may be a subset of the full dataset acquired from some other source.

Since I can't track down the original location of our current dataset_99.csv, we should probably use the "csiro_warehouse" version on OBIS and re-write the data cleaning script.

ahhurlbert commented 8 years ago

As a marine dataset with samples only given by lat-longs, this dataset is inappropriate for current purposes. Closing issue.