Sample ID from read file names

yerkes-gencore / gencoreBulk

An R package to facilitate bulk RNAseq analyses at the ENPRC Genomics Core

Other

1 stars 0 forks source link

I think we need to address the pattern matching for the sample ID's against the read file names. Currently, the "_S#{1:n}" part of the read file name is used as part of the sample ID. This is problematic because, that part of the name is generated by the extraction process not from any sample identifying label from the investigator or core. That number may be different from different sequencing runs or extractions even though the reads belong to the same sample library. I think we need to pattern match against the file name up to but not including that part. This is related to the issue of how to combine read files from multiple lanes/sequencing runs for alignment, but I'll open another issue about that.

yerkes-gencore / gencoreBulk

Sample ID from read file names #9