seandavi / MungeCuratedMGS

0 stars 4 forks source link

Decide on best approach when one PMID has different study details #13

Open seandavi opened 4 years ago

seandavi commented 4 years ago
# A tibble: 163 x 9
   PMID   `sequencing type` `16S variable r… `sequencing pla… `study design`  `matched on`  `confounders contr… `antibiotics exc… Country
   <chr>  <fct>             <fct>            <fct>            <fct>           <chr>         <chr>               <chr>             <chr>  
 1 28038… NA                NA               DNA-DNA Hybridi… case-control    NA            NA                  3 months          United…
 2 28173… 16S               V4               Roche454         case-control    NA            NA                  Not stated        Denmark
 3 27015… 16S               NA               Roche454         case-control    NA            NA                  within preceding… United…
 4 27625… 16S               NA               RT-qPCR          time series / … NA            NA                  NA                Finland
 5 23071… 16S               V1-V2            Roche454         case-control    NA            NA                  3 months          United…
 6 28467… 16S               V4-V5            Roche454         case-control    gender, age,… NA                  NA                United…
 7 20603… 16S               Tuf gene         Non-quantitativ… case-control    NA            age; Controls subj… less than one mo… France 
 8 20140… 16S               V3               Roche454         case-control    NA            NA                  3 months          China  
 9 29190… 16S               V1-V2            Roche454         cross-sectiona… NA            free from disease,… 6 months          United…
10 29234… 16S               NA               Illumina         case-control    NA            NA                  3 months          Brazil 
# … with 153 more rows
Warning messages:
1: In FUN(X[[i]], ...) :
  PMIDs 23032991, 28018325, 29207565, 29234019, 30497517 with >1 unique study design .
 Chosing the first one for each of them.
2: In FUN(X[[i]], ...) :
  PMIDs 28173873, 28620208, 28875948, 29051531, 29922272, 30535886 with >1 unique matched on .
 Chosing the first one for each of them.
3: In FUN(X[[i]], ...) :
  PMIDs 25763184, 26230509, 28035686, 28390422, 28875948, 29538354, 29922272, 30714640 with >1 unique confounders controlled for .
 Chosing the first one for each of them.
4: In FUN(X[[i]], ...) :
  PMIDs 25710027, 25763184, 26151645, 26600078, 28390422, 28512451, 28875948, 30279332, 30548192 with >1 unique antibiotics exclusion .
 Chosing the first one for each of them.
5: In FUN(X[[i]], ...) : PMIDs 28390422, 28968427 with >1 unique Country .
 Chosing the first one for each of them.
lwaldron commented 4 years ago

I think that splitting these into separate "studies" with a warning would be correct.