nadegeguiglielmoni / GraphUnzip

Unzip assembly graphs with Hi-C data and/or long reads.
GNU General Public License v3.0
25 stars 1 forks source link

missing contigs #2

Closed cmdoret closed 3 years ago

cmdoret commented 3 years ago

Hello,

I tried running GraphUnzip with a gaf file and a gfa of ONT reads aligned to it, but I get the following error:

Traceback (most recent call last):
  File "/run/media/varogh/storage_4T/pasteur_data/legionella/output/graphunzip/GraphUnzip/main.py", line 294, in <module>
    main()
  File "/run/media/varogh/storage_4T/pasteur_data/legionella/output/graphunzip/GraphUnzip/main.py", line 274, in main
    segments, cn = solve_ambiguities(
  File "/run/media/varogh/storage_4T/pasteur_data/legionella/output/graphunzip/GraphUnzip/solve_ambiguities.py", line 708, in solve_ambiguities
    listOfSegments, copiesNumber = merge_contigs(listOfSegments, copiesNumber, verbose = verbose)
  File "/run/media/varogh/storage_4T/pasteur_data/legionella/output/graphunzip/GraphUnzip/solve_ambiguities.py", line 401, in merge_contigs
    listOfSegments = merge_adjacent_contigs(listOfSegments)
  File "/run/media/varogh/storage_4T/pasteur_data/legionella/output/graphunzip/GraphUnzip/solve_ambiguities.py", line 367, in merge_adjacent_contigs
    listOfSegments = merge_simply_two_adjacent_contig(segment, endOfSegment, listOfSegments)
  File "/run/media/varogh/storage_4T/pasteur_data/legionella/output/graphunzip/GraphUnzip/solve_ambiguities.py", line 346, in merge_simply_two_adjacent_contig
    listOfSegments.remove(neighbor)
ValueError: list.remove(x): x not in list

The commands I ran were:

GraphAligner -g assembly_graph.gfa -f long_reads_filtered.fa -a ont_to_gfa.gaf -t6 -x dbg
GraphUnzip/main.py -g assembly_graph.gfa -lr ont_to_gfa.gaf -o unzipped

Do you think it could be something wrong with my files ?

My environment was the following:

Additional notes:

Thanks in advance !

RolandFaure commented 3 years ago

Hello,

Thank you for this report. We will remove the dependence to matplotlib.

Concerning your main problem, it may be an irregularity in your gfa file (like some contig referred to in the 'L' labels and not in the 'S' labels). It might also be that GraphUnzip does not handle an exceptional case (for example, we ran into this problem - now corrected - when a link was defined twice in the GFA). In any case I would like GraphUnzip to handle the exception.

If you can send me your GFA and GAF files (roland.faure@polytechnique.edu) I will try to see where the problem comes from and if it is a GraphUnzip bug. You can send me the GFA file replacing sequences by * to send a much smaller file.

nadegeguiglielmoni commented 3 years ago

Hello,

First, I would suggest you run GraphAligner with -x vg rather than -x dbg.

cmdoret commented 3 years ago

Hi,

Thanks @nadegeguiglielmoni, I'll use that now (but it did not fix this issue). Thanks @RolandFaure, I sent you my files. (they seemed normal to me, but I don't have much experience with the GFA format)

RolandFaure commented 3 years ago

There was indeed a little bug in GraphUnzip, it is now corrected. Many thanks for your feedback