Closed msierk closed 6 months ago
@msierk I can look into this very soon, but can you confirm your LOCALE
? I personally think I've been sorting using:
sort -k 1,1 -k 2,2g -k 3,3g -k 6,6
for a 6-column BED file with strand in the final position. But I'll check later today to confirm. The LOCALE
has historically been a problem for this, as that's the part that makes the difference for chrom order.
I'm on biowulf.
(base) [sierkml@cn4286 bismark2]$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
BTW, I don't know that this is related to the sorting issue, but I just discovered that the output from diff does not include any of the sites from chr10-chr22 - it skips straight from chr9 to chrX. I include the transition below, but the whole file is 127MB (and the .meth files it is derived from are 1.4GB). If this is an entirely separate issue I can create a new issue for it.
chr9 138220957 - CpG 0.652632 7 0 10 1
chr9 138279141 - CpG 0.6 2 0 1 0
chr9 138279153 - CpG 0.6 2 0 1 0
chr9 138279202 - CpG 0.6 2 0 1 0
chrX 10509 + CpGx 0.285714 2 2 1 0
chrX 10510 - CpG 0.166667 2 0 14 0
chrX 10574 + CpG 0.714286 4 0 1 0
chrX 15218 - CpG 0.5 1 0 1 0
@msierk can you confirm where you obtained your reference genome? I think I know what's going on, but it seems like an issue we had previously solved. If you could send me the following by email it would help: the chromosome order in the reference genome file (grep ^> reference.fa
be careful with the >
) and the order in the counts file (like cut -f 1 counts.meth.sym | uniq
) it would be very helpful.
Edit: considering even more, you might be able to tell me the first file in the pipeline that is missing the chromosomes you mention. Based on your comments above I think you'd know the shell commands to get that info.
@msierk I'm closing this due to inactivity. If you have further problems open another issue.
I produced hmr files using the following command:
dnmtools hmr -o ${METHA}_hypo.bed ${METHA}.sym.meth
However when I run dnmtools dmr I get the error:
regions not sorted in file: 231_NO_a_BS_hypo.bed
The hmr output had chrM after chrX, so I tried sorting using:
sort -k1,1V -k2,2n 231_NO_a_oxBS_hypo.bed > 231_NO_a_oxBS_hypo.sorted.bed
but I still get the same error. I've attached the two bed files. There are some random contigs (like chr14_GL000225v1_random), but as far as I can tell those are sorted correctly in the sorted file.Can you provide some explanation as to how these files need to be sorted to get dmr to run? Or can you have dmr print out the line where it encounters something it thinks is out of order?
231_NO_a_BS_hypo.bed.txt 231_NO_a_BS_hypo.sorted.bed.txt