r3fang / SnapTools

A module for working with snap files in Python
Apache License 2.0
35 stars 21 forks source link

Error running snap-add-pmat #25

Open drbecavin opened 4 years ago

drbecavin commented 4 years ago

I have tried the tutorial on brain 5k and everything worked after few modifications. Now I am running it on two pig datasets and it crashed when counting the peaks in the snap object.

I am running: snaptools snap-add-pmat --snap-file ${snap_file} --peak-file ${peaks_combined_file} --tmp-folder ${temp_file} For my two datasets, I got two different errors :

For the first I got this error :

Traceback (most recent call last):
  File "/home/becavin/.conda/envs/snapAtac/bin/snaptools", line 38, in <module>
    parse_args()    
  File "/home/becavin/.conda/envs/snapAtac/lib/python3.7/site-packages/snaptools/parser.py", line 176, in parse_args
    verbose=args.verbose)
  File "/home/becavin/.conda/envs/snapAtac/lib/python3.7/site-packages/snaptools/add_pmat.py", line 153, in snap_pmat
    for item in frag_bt.intersect(peak_bt, wa=True, wb=True):
  File "pybedtools/cbedtools.pyx", line 792, in pybedtools.cbedtools.IntervalIterator.__next__
  File "pybedtools/cbedtools.pyx", line 701, in pybedtools.cbedtools.create_interval_from_list
pybedtools.cbedtools.MalformedBedLineError: Start is greater than stop

The second error is :

Traceback (most recent call last):
  File "/home/becavin/.conda/envs/snapAtac/bin/snaptools", line 38, in <module>
    parse_args()    
  File "/home/becavin/.conda/envs/snapAtac/lib/python3.7/site-packages/snaptools/parser.py", line 176, in parse_args
    verbose=args.verbose)
  File "/home/becavin/.conda/envs/snapAtac/lib/python3.7/site-packages/snaptools/add_pmat.py", line 153, in snap_pmat
    for item in frag_bt.intersect(peak_bt, wa=True, wb=True):
  File "pybedtools/cbedtools.pyx", line 792, in pybedtools.cbedtools.IntervalIterator.__next__
  File "pybedtools/cbedtools.pyx", line 656, in pybedtools.cbedtools.create_interval_from_list
IndexError: list index out of range

The reason why it crashed is written in the error, but where and how to fix it is not obvious. Is there a workaround to see where it crashed and remove it in my bed file ?

For example it crashed before while reading the bed files. At this line for the first dataset:

b'1'    111157831    111158923

This line for the second dataset:

b'X'    270861350    270861819

I removed this two lines and it worked then. P.s: b'X' means X and is due to bad formatting of R export. I removed that but it seems important for snaptools to read the bed file, so I kept it at the end.

Thanks for your help.

(I have added an issue in SnapAtacd also, but maybe it is better here: https://github.com/r3fang/SnapATAC/issues/156 )

mruffalo commented 4 years ago

@drbecavin This PR may interest you: https://github.com/r3fang/SnapTools/pull/29

The sequence names (b'1' in your example) weren't being decoded to text strings.