lmrodriguezr / nonpareil

Estimate metagenomic coverage and sequence diversity
http://enve-omics.ce.gatech.edu/nonpareil/
Other
42 stars 11 forks source link

Data underlying figure 2 in doi 10.1128/mSystems.00039-18? #57

Closed taylorreiter closed 1 year ago

taylorreiter commented 1 year ago

Hello! Would you be willing to provide the data that produced figure 2 in 10.1128/mSystems.00039-18 if you still have it? I saw the supplemental table 3 with accession numbers, but having the nonpareil values and the 16s data would be super helpful for some pipeline testing I'm doing.

lmrodriguezr commented 1 year ago

Hello @taylorreiter I've made the full metadata table (metadata.tsv) and the list of excluded samples (exclude.txt) available as a gist: https://gist.github.com/lmrodriguezr/35d62193adeddcc0dbb9fada81c6f452

Please note that there are some NAs you'd need to exclude. The relevant columns are NpDiv (Nd index) and Hprime (H' index).

I'm closing the issue for now, but feel free to reopen if you need additional info (or data).

Best wishes Miguel.

taylorreiter commented 1 year ago

Hi @lmrodriguezr! Thank you so much for posting this. I just downloaded it and went to match it with the samples in Fig 2, and I think there might be some information missing. This type of table is exactly what I'm looking for, but I think it only contains estimates for human samples. Do you happen to have information for the samples of the other biomes (Soil, Marine, Freshwater, Engineered, Animal hosted, Enrichment, and mock). Thank you so much again for providing the table in your last post.

image

lmrodriguezr commented 1 year ago

You're right! Sorry, I forgot I had those two sets apart in my data. I've now uploaded the H' and Nd values for the environmental samples too here: https://gist.github.com/lmrodriguezr/c74684c275aa3e4db57a78e94c4fb7c0

The two files correspond to the diversity values indexed by sample name (TableS3.tsv) and the SRA IDs of each sample (metadata-all.tsv).

Best wishes, Miguel.