yezhengSTAT / mHiC

MIT License
22 stars 10 forks source link

Step 4 breaks with underscore in chromosome names. #9

Open aakashsur opened 5 years ago

aakashsur commented 5 years ago

So I have underscores in my chromosome names and the chrList variable is set to "LtaP_01 LtaP_02 LtaP_03 LtaP_04 LtaP_05 LtaP_06 LtaP_07 LtaP_08 LtaP_09 LtaP_10 LtaP_11 LtaP_12 LtaP_13 LtaP_14 LtaP_15 LtaP_16 LtaP_17 LtaP_18 LtaP_19 LtaP_20 LtaP_21 LtaP_22 LtaP_23 LtaP_24 LtaP_25 LtaP_26 LtaP_27 LtaP_28 LtaP_29 LtaP_30 LtaP_31 LtaP_32 LtaP_33 LtaP_34 LtaP_35 LtaP_36 MaxiA"

This leads to the following error message when Step 4 hits the KR normalization:

Traceback (most recent call last):
  File "/home/ec2-user/mHiC/bin/KR_norm_mHiC.py", line 466, in <module>
    writeInteraction(norm_mtx, baseName, args.outdir, revFragsDic, args.chrNum, args.resolution)
  File "/home/ec2-user/mHiC/bin/KR_norm_mHiC.py", line 357, in writeInteraction
    chr1, mid1 = revFragsDic[row[i]].split("_")
ValueError: too many values to unpack

Which looks to be caused because it is parsing chromosome names by underscore. Other than not using underscores in chromosome names, any ideas for a fix? I'll continue to look and see if I come up with something.

aakashsur commented 5 years ago

I believe I've isolated the issue to these four lines

https://github.com/yezhengSTAT/mHiC/blob/4a75988cf51e0a3f73e16270b08d4234404c89c2/bin/KR_norm_mHiC.py#L197

https://github.com/yezhengSTAT/mHiC/blob/4a75988cf51e0a3f73e16270b08d4234404c89c2/bin/KR_norm_mHiC.py#L357-L358

https://github.com/yezhengSTAT/mHiC/blob/4a75988cf51e0a3f73e16270b08d4234404c89c2/bin/KR_norm_mHiC.py#L374

By changing the join character and subsequent split characters, I'm able to resolve the issue. However it's probably better to store the logic in a data structure than in the string value. I haven't studied the code enough to figure out how it's using these names yet.

yezhengSTAT commented 5 years ago

Yes, we assume that there is no "_" in the chromosome name. I will add it to the manual. Thanks for pointing it out.

bgbrink commented 4 years ago

Is there any way to fix the pipeline or would the better option be to modify the input files?

yezhengSTAT commented 4 years ago

Is there any way to fix the pipeline or would the better option be to modify the input files?

Hello, Yes, you can replace "" in the above cited scripts by other uncommon symbol like "-" or "=". Or you can change the chromosome name (also in the corresponding reference genome files) by replacing the "" in the chromosome names by other symbol.

Thanks, Ye

DAljogol commented 4 years ago

Hi Ye,

Can you also please help me in resolving a similar issue? I'm trying to run mHiC but I get the following error: (Please note that chrList=($(seq 1 22) X Y M)) )

Traceback (most recent call last): File "/home/software/mHiC-master/bin/KR_norm_mHiC.py", line 469, in writeBias(bias, baseName, args.outdir, revFragsDicAll, args.chrNum, args.resolution) File "/home/software/mHiC-master/bin/KR_normmHiC.py", line 374, in writeBias chr, mid = revFragsDicAll[i].split("") ValueError: too many values to unpack (expected 2)

Best, Dina

yezhengSTAT commented 4 years ago

Hi Ye,

Can you also please help me in resolving a similar issue? I'm trying to run mHiC but I get the following error: (Please note that chrList=($(seq 1 22) X Y M)) )

Traceback (most recent call last): File "/home/software/mHiC-master/bin/KR_norm_mHiC.py", line 469, in writeBias(bias, baseName, args.outdir, revFragsDicAll, args.chrNum, args.resolution) File "/home/software/mHiC-master/bin/KR_normmHiC.py", line 374, in writeBias chr, mid = revFragsDicAll[i].split("") ValueError: too many values to unpack (expected 2)

Best, Dina

Hello Dina, Thanks for using mHiC! Do you also have "_" in your chromosome name? How did you set the variable "chrList=**"?

(Now I am thinking of replacing "_" by other joint symbol in my codes......I will let you know once I finish it......)

Thanks, Ye

yezhengSTAT commented 4 years ago

Hello, I have changed the joint symbol "_" in KR_norm_mHiC.py into "=" and tested on the demo data. Let me know if there is any issue or further problem with that.

Thanks again for pointing it out! Very appreciate it!

Thanks, Ye

DAljogol commented 4 years ago

Thanks Ye for your help!! It works fine now :)

Best, Dina

Hello, I have changed the joint symbol "_" in KR_norm_mHiC.py into "=" and tested on the demo data. Let me know if there is any issue or further problem with that.

Thanks again for pointing it out! Very appreciate it!

Thanks, Ye