Closed gktharp1 closed 9 months ago
This may have been an issue intended for the STAR alignment pipeline. There isn't any pattern matching here for associating files or metadata. The user must manually associate filenames with metadata. There is now a helper script for pre-populating an experimental design table with the sample id s[0-9]+
parsed from the filenames, but that is it: the user still has to manually add data.
I think we need to address the pattern matching for the sample ID's against the read file names. Currently, the "_S#{1:n}" part of the read file name is used as part of the sample ID. This is problematic because, that part of the name is generated by the extraction process not from any sample identifying label from the investigator or core. That number may be different from different sequencing runs or extractions even though the reads belong to the same sample library. I think we need to pattern match against the file name up to but not including that part. This is related to the issue of how to combine read files from multiple lanes/sequencing runs for alignment, but I'll open another issue about that.