mc2-center / csbc-pson-dcc

Data coordination resources for the NCI CSBC and PS-ON consortia
1 stars 4 forks source link

missing files from some GEO datasets #71

Closed bswhite closed 2 years ago

bswhite commented 4 years ago

We are missing files from some GEO datasets. One example is GSE87517/PRJNA345006

https://staging.csbc-pson.synapse.org/Explore/Datasets/DetailsPage?datasetId=syn12976694

I'm fairly certain this is because these data were populated using get-geo-annotations.R, which has the following offending line:

Iterate over each of the GSMs associated with this GSE

metadata.tbl <- ldply(sampleNames(phenoData(gse.geo[[1]])),

Note that '[[1]]' above -- gse.geo is a list and I never understood why, so I naively just took the first element. I still don't know why, though I suspect it may be that each list element corresponds to a different platform. e.g., the example above was data generated on both the HiSeq 2500 and NextSeq 500.

This is a larger issue beyond this one dataset and I'm not going to attempt to fix it now.