mhoban / rainbow_bridge

GNU General Public License v3.0
5 stars 2 forks source link

Enable customization of how sample IDs are interpreted from filenames #9

Closed mhoban closed 1 year ago

mhoban commented 1 year ago

Right now, the sample IDs come from fromFilePairs, anything before the R1/R2 in the file. Figure out how to let the user customize this somehow. Probably give the option to pass a regex of some sort.

mhoban commented 1 year ago

I think this can be done with a CSV file that maybe looks something like this:

sample_id file_pattern
sample1 file1_nonsensefff{R1,R2}.fastq
sample2 file2_nonsensefff{R1,R2}.fastq
sample3 file3_nonsensefff{R1,R2}.fastq

Then right after the reads are loaded, we just go through and reinterpret the sample ID part of the reads tuple using this CSV map. Is it easy to load and use CSV files? Maybe I need an external script. Actually I can probably do something clever with awk (viz. making an associative array) and we can just use a two column tab-separated file (with no headers) rather than an actual CSV

mhoban commented 1 year ago

This is now implemented in 5b14cdfd1af5ca87032fc91c28c8e63df94d9c92