jpuritz / dDocent

a bash pipeline for RAD sequencing
ddocent.com
MIT License
53 stars 41 forks source link

RPE Uniq Seqs Correct? #53

Closed cbird808 closed 4 years ago

cbird808 commented 4 years ago

These lines of code (895-901) in the main bash script only get uniq seqs from F reads, then rejoin F&R even though they are no longer paired and there are more R than F.

        if [ "$ATYPE" = "RPE" ]; then
            cat namelist | parallel --no-notice -j $NUMProc "paste {}.forward {}.reverse | $sort -k1 -S 200M > {}.fr"
            cat namelist | parallel --no-notice -j $NUMProc "cut -f1 {}.fr | uniq -c > {}.f.uniq && cut -f2 {}.fr > {}.r"
            cat namelist | parallel --no-notice -j $NUMProc "mawk '$AWK4' {}.f.uniq > {}.f.uniq.e" 
            cat namelist | parallel --no-notice -j $NUMProc "paste -d '-' {}.f.uniq.e {}.r | mawk '$AWK3'| sed -e 's/-/NNNNNNNNNN/' | sed -e '$SED1' | sed -e '$SED2'> {}.uniq.seqs"
            rm *.f.uniq.e *.f.uniq *.r *.fr
        else