Enable customization of how sample IDs are interpreted from filenames

mhoban commented 1 year ago

Right now, the sample IDs come from fromFilePairs, anything before the R1/R2 in the file. Figure out how to let the user customize this somehow. Probably give the option to pass a regex of some sort.

mhoban commented 1 year ago

I think this can be done with a CSV file that maybe looks something like this:

sample_id	file_pattern
sample1	file1_nonsensefff{R1,R2}.fastq
sample2	file2_nonsensefff{R1,R2}.fastq
sample3	file3_nonsensefff{R1,R2}.fastq

Then right after the reads are loaded, we just go through and reinterpret the sample ID part of the reads tuple using this CSV map. Is it easy to load and use CSV files? Maybe I need an external script. Actually I can probably do something clever with awk (viz. making an associative array) and we can just use a two column tab-separated file (with no headers) rather than an actual CSV

mhoban commented 1 year ago

This is now implemented in 5b14cdfd1af5ca87032fc91c28c8e63df94d9c92

mhoban / rainbow_bridge

Enable customization of how sample IDs are interpreted from filenames #9