Closed ckennedy-nmdp closed 9 years ago
Is the problem that the synthetic data don't have R in the file names? e.g. HLA00001_1.fq
, HLA00001_2.fq
I can add that if necessary; I don't think R1/R2 convention is universal though.
No, that is also a problem, and currently we're assembling each paired-end separately, which is producing bad results. I'll log another issue because it would be good if the simulation code followed the R convention, though it's not universal.
The problem is merely that the simulation code outputs .fq as an abbreviation for .fastq. But the splitter script only looks for .fastq. Both are valid so the splitter should be changed.
This is already complete
Yep didn't pull correctly. Thanks!
Have to generalize regex for the following:
if [[ $(find ${RAWDIR} -type f | grep -c fastq) -eq 0 ]]; then for MYIDENTIFIER in $(find ${RAWDIR} -name '*.fastq' -print -type f | sed -e 's/R[12]/RX/g' | uniq); do if [[ ${DEBUG}"x" == "1x" ]]; then