merenlab / public-marine-omics-metadata

Semi-curated metadata for marine metagenomes (and other 'omics data types)
1 stars 0 forks source link

Get environmental metadata for bioGEOTRACES from BODC #1

Open raissameyer opened 3 months ago

raissameyer commented 3 months ago

To get env data from https://www.bodc.ac.uk/geotraces/, we need the Geotraces section ID that are noted in Table 1 of the publication https://www.nature.com/articles/sdata2018176/tables/2

Those are GA02, GA03, GA10, and GP13

First we start by going to https://geotraces.webodv.awi.de. Then we select which dataset we'd like to use. I chose IDP2021 v2 > seawater

image

Chosing this takes you to the next page where you choose between data extraction and data exploration. I chose data extraction, of course. They also have a guide doc https://geotraces.webodv.awi.de/documentation/webodv-data-extractor-howto.pdf

image

You then get to a page where you can select the Cruise/Domain/Time Range. Originally, 64 (all) items are selected in the cruises tab. This is where the Geotraces section ID come in. We will select only those in the Cruises tab (GA02, GA03, GA10, GP13). That way the visual of all the sampling sites selected will go from this:

image to this

image

We then move to the next tab where we can select variables. We select ain OR mode

"OR" option: Show all stations, which have data for one OR more of the selected variables available. "AND" option: Show all stations, which have data for ALL of the selected variables available.

I selected

That gives us: Output variables: 54 of 350

image

Next we can download the data in different formats

image

It will take a moment and then you'll get a zip file that you can unzip. The .txt file contains your table (and a bunch of info explaining the data variables at the top, but you can remove the info on top (of course keep the column names - starting with Cruise ID)) and then it will be like a normal table when you open it in Numbers and you can import it into R. The infos folder includes metadata about each and every entry (who collected the data, etc.)

image