marbl / ModDotPlot

MIT License
101 stars 7 forks source link

Error when query is shorter than the reference #26

Closed hloucks closed 1 month ago

hloucks commented 1 month ago

I was running in static mode and attempting to compare two sequences of different sizes and couldn't figure out why I was only getting an indexing error sometimes. I realized that when the query sequence is longer than the reference sequence everything works just fine, but if the query is shorter I get an indexing error. I was able to replicate it just by truncating one of the example sequences given in the sequences directory, the chr21_segment.tiny.fa file referenced is just the first 500 lines of the chr21_segment.fa

moddotplot static -f sequences/chr21_segment.tiny.fa sequences/chr15_segment.fa --compare-only

  __  __           _   _____        _     _____  _       _   
 |  \/  |         | | |  __ \      | |   |  __ \| |     | |  
 | \  / | ___   __| | | |  | | ___ | |_  | |__) | | ___ | |_ 
 | |\/| |/ _ \ / _` | | |  | |/ _ \| __| |  ___/| |/ _ \| __|
 | |  | | (_) | (_| | | |__| | (_) | |_  | |    | | (_) | |_ 
 |_|  |_|\___/ \__,_| |_____/ \___/ \__| |_|    |_|\___/ \__|

v0.8.2 

Running ModDotPlot in static mode

Retrieving k-mers from Chr21.... 

Progress: |████████████████████████████████████████| 100.0% Completed

Chr21 k-mers retrieved! 

Retrieving k-mers from Chr15.... 

Progress: |████████████████████████████████████████| 100.0% Completed

Chr15 k-mers retrieved! 

Computing pairwise identity matrix for Chr21 and Chr15... 

        Sequence length Chr21: 34930

        Sequence length Chr15: 6000000

        Window size w: 35

        Modimizer sketch size: 35

        Plot Resolution r: 1000

Traceback (most recent call last):████████████████-| 99.0% Complete
  File "/private/groups/migalab/hloucks/CenSat/arraySim/ModDotPlot/venv/bin/moddotplot", line 5, in <module>
    from moddotplot.__main__ import main
  File "/private/groups/migalab/hloucks/CenSat/arraySim/ModDotPlot/venv/lib/python3.11/site-packages/moddotplot/__main__.py", line 11, in <module>
    sys.exit(main())
             ^^^^^^
  File "/private/groups/migalab/hloucks/CenSat/arraySim/ModDotPlot/venv/lib/python3.11/site-packages/moddotplot/moddotplot.py", line 940, in main
    pair_mat = createPairwiseMatrix(
               ^^^^^^^^^^^^^^^^^^^^^
  File "/private/groups/migalab/hloucks/CenSat/arraySim/ModDotPlot/venv/lib/python3.11/site-packages/moddotplot/estimate_identity.py", line 96, in createPairwiseMatrix
    matrix = pairwiseContainmentMatrix(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/groups/migalab/hloucks/CenSat/arraySim/ModDotPlot/venv/lib/python3.11/site-packages/moddotplot/estimate_identity.py", line 385, in pairwiseContainmentMatrix
    containment_matrix[w, q] = (
    ~~~~~~~~~~~~~~~~~~^^^^^^
IndexError: index 998 is out of bounds for axis 0 with size 998

If I switch the order however, it works just fine:

moddotplot static -f sequences/chr15_segment.fa sequences/chr21_segment.tiny.fa --compare-only

  __  __           _   _____        _     _____  _       _   
 |  \/  |         | | |  __ \      | |   |  __ \| |     | |  
 | \  / | ___   __| | | |  | | ___ | |_  | |__) | | ___ | |_ 
 | |\/| |/ _ \ / _` | | |  | |/ _ \| __| |  ___/| |/ _ \| __|
 | |  | | (_) | (_| | | |__| | (_) | |_  | |    | | (_) | |_ 
 |_|  |_|\___/ \__,_| |_____/ \___/ \__| |_|    |_|\___/ \__|

v0.8.2 

Running ModDotPlot in static mode

Retrieving k-mers from Chr15.... 

Progress: |████████████████████████████████████████| 100.0% Completed

Chr15 k-mers retrieved! 

Retrieving k-mers from Chr21.... 

Progress: |████████████████████████████████████████| 100.0% Completed

Chr21 k-mers retrieved! 

Computing pairwise identity matrix for Chr15 and Chr21... 

        Sequence length Chr15: 6000000

        Sequence length Chr21: 34930

        Window size w: 6000

        Modimizer sketch size: 1500

        Plot Resolution r: 1000

Progress: |████████████████████████████████████████| 100.0% Completed

Saved bed file to Chr15.bed

None
5999
Creating plots and saving to ./Chr15_Chr21...

./Chr15_Chr21.pdf, ./Chr15_Chr21.png, ./Chr15_Chr21_HIST.pdf and ./Chr15_Chr21_HIST.png saved sucessfully. 
alexsweeten commented 1 month ago

Hi @hloucks ,

Thank you for catching this!

This has been patched in a recent version update, so the results should be independent of order of entry.

Cheers, Alex