guoweilong / cgmaptools

toolbox for analysing BS-seq data, advance features in SNV, ASM and DMR
https://cgmaptools.github.io
62 stars 25 forks source link

error with cgmaptools mtr - CGmapToRegion #19

Open retrogenomics opened 5 years ago

retrogenomics commented 5 years ago

Hi, I'm trying to use cgmaptools mtr to call methylation in multiple regions, but I get the following error:

cgmaptools mtr -i sample.CGmap.gz  -r regions.bed

Traceback (most recent call last):
  File "/Users/gcristof/Lab/bioinfo/tools/cgmaptools/bin/CGmapToRegion", line 211, in <module>
    main()
  File "/Users/gcristof/Lab/bioinfo/tools/cgmaptools/bin/CGmapToRegion", line 205, in main
    CGmapToRegion(options.CGmapFile, options.regionFile)
  File "/Users/gcristof/Lab/bioinfo/tools/cgmaptools/bin/CGmapToRegion", line 102, in CGmapToRegion
    key_c = Get_key(chr_c)
  File "/Users/gcristof/Lab/bioinfo/tools/cgmaptools/bin/CGmapToRegion", line 38, in Get_key
    match = re.match(r"^chr(\d+)", str, re.I)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/re.py", line 173, in match
    return _compile(pattern, flags).match(string)
TypeError: cannot use a string pattern on a bytes-like object

Curiously, when I first zcat the sample.CGmap.gz file and pipe the result to cgmaptool mtr, it starts to output some lines, then from chr2, I get plenty of NA lines, and finally I get another error:

zcat sample.CGmap.gz | cgmaptools mtr -r regions.bed

Traceback (most recent call last):
  File "/Users/gcristof/Lab/bioinfo/tools/cgmaptools/bin/CGmapToRegion", line 211, in <module>
    main()
  File "/Users/gcristof/Lab/bioinfo/tools/cgmaptools/bin/CGmapToRegion", line 205, in main
    CGmapToRegion(options.CGmapFile, options.regionFile)
  File "/Users/gcristof/Lab/bioinfo/tools/cgmaptools/bin/CGmapToRegion", line 104, in CGmapToRegion
    if key_c < key_r :
TypeError: '<' not supported between instances of 'int' and 'str'

I suspect a problem of sorting, but I get similar error with the cgmaptools sort command. In the help message it is indicated:

Note: The two input CGmap files should be sorted by Sort_chr_pos.py first.

What are the 2 input files? I only have one. And where can I find the Sort_chr_pos.pyscript ? not in cgmap toolbox?

Thanks for your help!

retrogenomics commented 5 years ago

Actually, I found where the errors come from: this is a problem of python2 vs python3 compatibility. This can be solved by replacing the shebang heading each .py script this way: #!/usr/bin/env python to #!/usr/bin/env python2 This ensures that python2.7 is actually used when both are installed and python3 is the default.

retrogenomics commented 5 years ago

The error problem is solved by changing the shebangs, but not the NA lines starting from chr2. The order of chromosomes in the CGmap.gz file (and in the .mtr file) is as follow:

chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr20
chr21
chr22
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrUn_GL000214v1
chrUn_GL000219v1
chrUn_KI270435v1
chrUn_KI270442v1
chrUn_KI270512v1
chrUn_KI270521v1
chrUn_KI270582v1
chrUn_KI270751v1
chrX
chrY

Any idea of what could be the problem?

retrogenomics commented 5 years ago

I also tried by sorting both the .bed and .CGmap.gz files using cgmaptools sort command. The chromosomes are in natural order (chr1 chr2 ... chr10 chr11) in both files, but then it's even worse: I only have NA... I'm getting crazy with this... I really wish I can use CGmap... help, please!

doncarlos999 commented 5 years ago

I had the same problem but I fixed it by sorting my bed file with "sort -k1,1 -V -k2,2n" and my CGmap file with "sort -k1,1 -V -k3,3n" and then piping directly into cgmaptools mtr. This fixed all the NA problems for me. Hope it helps.

guoweilong commented 5 years ago

@retrogenomics Sorry for just notice your issue. It seems all alter of this issue go to sparm box. Did you fix the this issue? Or can you send me test file to my email (guoweilong@126.com) if you still have problem.

Best, Weilong