relipmoc / skewer

MIT License
95 stars 17 forks source link

Support for CASAVA input/output file names #15

Open enricoferrero opened 9 years ago

enricoferrero commented 9 years ago

I have CASAVA input files in the format:

SampleName_CGTGTA_L004_R1_001.fastq.gz
SampleName_CGTGTA_L004_R2_001.fastq.gz

I want to run FastQC in CASAVA mode before and after trimming to check the effect of trimming (i.e., using fastqc --casava).

FastQC only recognises CASAVA input files if they are in the above format. From the FastQC help:

 --casava        Files come from raw casava output. Files in the same sample
                    group (differing only by the group number) will be analysed
                    as a set rather than individually. Sequences with the filter
                    flag set in the header will be excluded from the analysis.
                    Files must have the same names given to them by casava
                    (including being gzipped and ending with .gz) otherwise they
                    won't be grouped together correctly.

So, in order to be compatible with FastQC/CASAVA, skewer has to use output file names that have exactly the same format of input files (currently it add a -trimmed-pair1 or -trimmed-pair2 suffix).

Is it possible to implement this feature? I see two options:

Thanks!

relipmoc commented 9 years ago

enrico16, thank you for your suggestions! I'll try to modify the source codes a few days later. I'm so busy these days.

enricoferrero commented 8 years ago

Not sure why I closed this at the time... Is there any progress on this? Thanks!