waldronlab / curatedMetagenomicDataCuration

Sample Metadata Curation for curatedMetagenomicData
https://waldronlab.io/curatedMetagenomicDataCuration/
28 stars 23 forks source link

Add all HMP 1 healthy participants #71

Open lwaldron opened 1 year ago

lwaldron commented 1 year ago

cMD3 currently has only 103 of 265 HMP1 healthy participants, so it would be great to add the rest to a future version. I'm attaching a file manifest and metadata created from hmpdacc.org by selecting all FASTQ files from the "Human Microbiome Project (HMP)" project.

suppressPackageStartupMessages({
  library(curatedMetagenomicData)
  library(dplyr)
  library(readr)
})
hmp_healthy_metadata <- read_delim("~/Downloads/hmp_manifest_metadata_9ef42dffdd.csv", 
                                   delim = ",", escape_double = FALSE, 
                                   col_types = cols(subject_id = col_character(), 
                                                    visit_number = col_integer()), trim_ws = TRUE)
incmd <- filter(sampleMetadata, grepl("HMP_2012", sampleMetadata$study_name)) %>%
  pull(subject_id) %>% 
  sub("HMP_2012_", "", .) %>%
  unique()

summary(incmd %in% hmp_healthy_metadata$subject_id)
#>    Mode    TRUE 
#> logical     103
summary(unique(hmp_healthy_metadata$subject_id) %in% incmd)
#>    Mode   FALSE    TRUE 
#> logical     162     103

Created on 2022-12-06 with reprex v2.0.2

hmp_manifest_metadata_9ef42dffdd.csv hmp_manifest_36457c738b.csv