schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
305 stars 36 forks source link

Chromosomes IDs do not match #175

Closed sarah872 closed 1 year ago

sarah872 commented 1 year ago

Hi, I'm facing some errors running syri. I'm comparing a RefSeq assembly vs a de-novo assembly, and based on ANI they are 99.99% identical over 99.99% length

syri -c A_B.bam -r CP007726.1.dnaA.fasta -q Nelongata.canu-chr.dnaA.fasta -F B --prefix A_B
Reading Coords - WARNING - Chromosomes IDs do not match.
Reading Coords - WARNING - Matching them automatically. For each reference genome, most similar query genome will be selected. Check mapids.txt for mapping used.
Reading Coords - WARNING - Reference chromosome CP007726.1 do not have any directed alignments with its homologous chromosome in the query genome (tig00000001). Filtering out all corresponding alignments.
Traceback (most recent call last):
  File "/scratch/comparison/syri_env/bin/syri", line 6, in <module>
    main(sys.argv[1:])
  File "/scratch/comparison/syri_env/lib/python3.8/site-packages/syri/scripts/syri.py", line 326, in main
    syri(args)
  File "/scratch/comparison/syri_env/lib/python3.8/site-packages/syri/scripts/syri.py", line 183, in syri
    if len(seq) < bchr_size[chrlink[chrid]]:
KeyError: 'CP007726.1'

... when disabling the automatic matching:

syri -c A_B.bam -r CP007726.1.dnaA.fasta -q Nelongata.canu-chr.dnaA.fasta -F B --prefix A_B --no-chrmatch

Reading Coords - WARNING - Chromosomes IDs do not match.
Reading Coords - WARNING - --no-chrmatch is set. Not matching chromosomes automatically.
Reading Coords - WARNING - CP007726.1, tig00000001 present in only one genome. Removing corresponding alignments
Traceback (most recent call last):
  File "/scratch/comparison/syri_env/bin/syri", line 6, in <module>
    main(sys.argv[1:])
  File "/scratch/comparison/syri_env/lib/python3.8/site-packages/syri/scripts/syri.py", line 326, in main
    syri(args)
  File "/scratch/comparison/syri_env/lib/python3.8/site-packages/syri/scripts/syri.py", line 214, in syri
    startSyri(args, coords[["aStart", "aEnd", "bStart", "bEnd", "aLen", "bLen", "iden", "aDir", "bDir", "aChr", "bChr"]])
  File "syri/pyxFiles/synsearchFunctions.pyx", line 516, in syri.synsearchFunctions.startSyri
  File "syri/pyxFiles/synsearchFunctions.pyx", line 911, in syri.synsearchFunctions.outSyn
  File "/scratch/comparison/syri_env/lib/python3.8/site-packages/pandas/core/generic.py", line 5915, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
  File "/scratch/comparison/syri_env/lib/python3.8/site-packages/pandas/core/generic.py", line 823, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/scratch/comparison/syri_env/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 230, in set_axis
    self._validate_set_axis(axis, new_labels)
  File "/scratch/comparison/syri_env/lib/python3.8/site-packages/pandas/core/internals/base.py", line 70, in _validate_set_axis
    raise ValueError(
ValueError: Length mismatch: Expected axis has 0 elements, new values have 7 elements
mnshgl0110 commented 1 year ago

Hi, The first warning message:

Reference chromosome CP007726.1 do not have any directed alignments with its homologous chromosome in the query genome (tig00000001). Filtering out all corresponding alignments.

Suggests that the homologous chromosomes might have different strands (check https://github.com/schneebergerlab/syri/issues/48). Reversing complementing the chromosomes to ensure that same strands are being compared should fix this issue.

The second error message is most probably also caused because of this.

You can try fixchr to get consist strands for chromosomes.

sarah872 commented 1 year ago

Thank you, that was it!