nmdp-bioinformatics / pipeline

Consensus assembly and allele interpretation pipeline.
GNU Lesser General Public License v3.0
7 stars 7 forks source link

splitter should handle .fastq and .fq file extensions #78

Closed ckennedy-nmdp closed 9 years ago

ckennedy-nmdp commented 9 years ago

Have to generalize regex for the following:

if [[ $(find ${RAWDIR} -type f | grep -c fastq) -eq 0 ]]; then for MYIDENTIFIER in $(find ${RAWDIR} -name '*.fastq' -print -type f | sed -e 's/R[12]/RX/g' | uniq); do if [[ ${DEBUG}"x" == "1x" ]]; then

ghost commented 9 years ago

Is the problem that the synthetic data don't have R in the file names? e.g. HLA00001_1.fq, HLA00001_2.fq I can add that if necessary; I don't think R1/R2 convention is universal though.

ckennedy-nmdp commented 9 years ago

No, that is also a problem, and currently we're assembling each paired-end separately, which is producing bad results. I'll log another issue because it would be good if the simulation code followed the R convention, though it's not universal.

The problem is merely that the simulation code outputs .fq as an abbreviation for .fastq. But the splitter script only looks for .fastq. Both are valid so the splitter should be changed.

mthorsen22 commented 9 years ago

This is already complete

ckennedy-nmdp commented 9 years ago

Yep didn't pull correctly. Thanks!