smithlabcode / methpipe

A pipeline for analyzing DNA methylation data from bisulfite sequencing.
http://smithlabresearch.org/methpipe
66 stars 27 forks source link

duplicate-remover error "input not properly sorted" with paired end reads #119

Closed aimstar2006 closed 6 years ago

aimstar2006 commented 6 years ago

Hi

When I ran the duplicate-remover for the .mr sorted files (native WALT output), error comes as "input not properly sorted". When I check the sorted file with the error position, the reads have come from paired end both + and - strand. Please refer the log and suggest how we can overcome such and proceed further. You timely support will help me quick.

Here i'm sharing the complete log of the error and script I ran.

Sorting:

sort -k 1,1 -k 2,2n -k 3,3n -k 6,6 -o ERR192350.mr.sorted ERR192350.mr

Duplicate removing

/opt/methpipe-3.4.3/bin/duplicate-remover -S ERR192350_stats_summary.txt -o ERR192350.mr.dup.removed ERR192350.mr.sorted

Error log:

input not properly sorted: 1 3001791 3001886 ERR192350.128056027 2 - GTAGTAAGGGAAAATGGTTAAGTAATATATAAATGTAGGTTTATTAGAATTATATTAGATTTTTTATTAGAGATTATGAAAATTAGAAGATTTTG DABDFFFHGEEBEBFFHHEE@AEFFGCFF9:CFCGCDGFGHBHHCCD<DEEG<DHIIGEHIIIIEHIAAGC4@DHHEFCFEECBEE@CCCCCC 1 3001791 3001886 ERR192350.183006656 3 + TAGGATTTTTTAGTTTTTATAGTTTTTGGTGAGAAGTTGGGTGTAATTTTAATAGGTTTATATTTATATGTTATTTGATTTTTTTTTTTTATTGT FEFHHHHHJIIHHHHHIJIIIIFFGIJJJCGIJIJJBBGDG7D@FHIEGIHGHIHGHIJIGHHHG>HEFFFCFFDFCCEEECEDDDDBB92:A##

thanks in advance.

jqujqu commented 6 years ago

Could you try this?

LC_ALL=C sort -k 1,1 -k 2,2n -k 3,3n -k 6,6 -o ERR192350.mr.sorted ERR192350.mr

aimstar2006 commented 6 years ago

Thanks for your support. It had worked.