smithlabcode / methpipe

A pipeline for analyzing DNA methylation data from bisulfite sequencing.
http://smithlabresearch.org/methpipe
67 stars 27 forks source link

Error with test dataset #166

Closed hchetia closed 3 years ago

hchetia commented 3 years ago

Trying out the methpipe steps with test dataset- After the sorting step with " sort -k1,1 -k2,2n -k3,3n -k6,6 snippet.mr -o t && mv t snippet.mr" The next step with duplicate-remover gives an error- Input not properly sorted:

image

andrewdavidsmith commented 3 years ago

@hchetia Did you make sure to set the "locale" properly? I think that's in the manual. I can't look it up right now, but I'm pretty sure it's specified.

terencewtli commented 3 years ago

Hi, before sort, please type in LC_ALL=C. Let us know if that works.

hchetia commented 3 years ago

Hi yes that worked. Could you please clarify what the flag LC_ALL=C does?

andrewdavidsmith commented 3 years ago

The locale determines several behaviors of the system in relation to language, country, character encoding, etc. You can find lots of info about this with google, but basically the LC_ALL=C sets most aspects of your locale (for that command; can also be exported I guess) to the one assumed by the C language, which is usually considered the simplest, and is most consistent with the default encoding used in machines setup in the US. Because I've only ever worked in Canada and the US, I haven't had to worry about this personally. But I have lots of students who come from other countries and their own personal machines are setup differently, so they need to set this variable. In your specific case, I suspect it might have to do with the order of + and -. But I could be wrong on all of this.... I suggest doing a quick web search. And I think no matter what your system's locale, setting LC_ALL=C would never hurt if the order will later be checked within a program that uses ASCII encoding of strings.

hchetia commented 3 years ago

Thanks for that clarification Andrew. Adding another error snapshot here- This time with "hmr" image

andrewdavidsmith commented 3 years ago

@hchetia Can you create a separate issue for that? I'll close this one.