medvedevgroup / SibeliaZ

A fast whole-genome aligner based on de Bruijn graphs
http://medvedevgroup.com/
Other
141 stars 19 forks source link

Large genome files #29

Open elissaralam opened 3 years ago

elissaralam commented 3 years ago

Hello,

I have two large genome files and would like to use SibeliaZ. I recently read this on a biostars thread "Yes, by SibeliaZ you can get .gfa file out of a fasta file. They recently updated there old version and now large genome file can also be used for the same." But I cannot find any confirmation that SibeliaZ supports large genome files in your version history. Can you confirm that?

Thanks a lot! Elissa

iminkin commented 3 years ago

Hi @elissaralam , yes SibeliaZ was developed for handling larger datasets.

elissaralam commented 3 years ago

Hello,

Thank you for your quick response! SibeliaZ worked perfectly well in generating a global alignment for my two genomes. However, i am trying to generate a gfa file from the maf file using the function "python maf_to_gfa1.py" but i am constantly getting an IndexError "IndexError: list index out of range". Have you encountered this before?

Thank you for your help, Elissar

iminkin commented 3 years ago

Hi Elissar,

Sorry, but it seems to be a bug in my code. Is it possible for you to share the input and output files you have so that I can try to debug the script?

Ilia

markopetek commented 3 years ago

I'm also trying to align two large genome files (~840Mbp) of the same species and got the same error with maf_to_gfa1.py.

this is the full error message:

Traceback (most recent call last):
  File "../scripts/maf_to_gfa1.py", line 202, in <module>
    blocks, sequence = split_maf_blocks(args.maf)
  File "../scripts/maf_to_gfa1.py", line 112, in split_maf_blocks
    while prev_column < len(maf[0].body):
IndexError: list index out of range