waldronlab / curatedMetagenomicDataCuration

Sample Metadata Curation for curatedMetagenomicData
https://waldronlab.io/curatedMetagenomicDataCuration/
28 stars 23 forks source link

SmitsSA_2017 sample info doesn't match supp.table #61

Closed luzhang321 closed 2 years ago

luzhang321 commented 2 years ago

Hi :)

SmitsSA_2017 data has 40 samples recorded(sample information matched with ena (https://www.ebi.ac.uk/ena/browser/view/PRJNA392180?show=reads). Data is from the paper: Seasonal cycling in the gut microbiome of the Hadza hunter-gatherers of Tanzania. In their supp.table1 ( filtered metagenomic = yes ), only 35 samples are recorded. In all,

  1. 32 are overlapped with supp.table. 3 samples are missing :

TZ_HADZA_3132 MALE TZ_HADZA_177 MALE TZ_HADZA_3187 MALE

  1. extra 8 samples were recorded in cMD than in the supp.table

[1] "TZ_HADZA_3202" "TZ_HADZA_3134" "TZ_HADZA_3104" "TZ_HADZA_102" "TZ_HADZA_174" "TZ_HADZA_883" "TZ_HADZA_245" "TZ_HADZA_243"

  1. unmatched information in cMD and supp.table Sample_ena/supp.table gender_supp gender_cMD age_supp age_cMD sample_id_ena/cMD TZ_HADZA_3182 FEMALE NA 48 NA TZ_86768 TZ_HADZA_3099 FEMALE female 6 40 TZ_81781 TZ_HADZA_828 MALE male 33 30 TZ_27689 TZ_HADZA_3244 FEMALE female 55 NA TZ_47979

Looking forward to your reply and thank you!

lwaldron commented 2 years ago

Very sorry for the delay here - @paolinomanghi will you be able to resolve this for the Bioconductor 3.14 release?

paolinomanghi commented 2 years ago

Thanks for your work on this table. It is normal to have more samples updated on NCBI then actually analised in the paper. The table I used to map the files against the raw reads was https://www.ncbi.nlm.nih.gov/sra?linkname=bioproject_sra_all&from_uid=392180.