medvedevgroup / SibeliaZ

A fast whole-genome aligner based on de Bruijn graphs
http://medvedevgroup.com/
Other
140 stars 19 forks source link

Visualization of output and maf_to_gfa1.py TabError: inconsistent use of tabs and spaces in indentation #20

Closed devenderarora closed 3 years ago

devenderarora commented 3 years ago

Dear iminkin, I was trying to make circos plot using your extremely user friendly tool SibeliaZ. I have my input ready in fasta file and got output in maf and gff format. I would like to visualize the plot henceforth I ran python maf_to_gfa1.py but ended up with the error at line nume 96 prev_profile = profile(maf, 0) TabError: inconsistent use of tabs and spaces in indentation.

I also tried python maf_to_xmfa.py < but it is running from last one day.

How can I visualize the plot. Am I missing something?

iminkin commented 3 years ago

Hi, thank you for reporting! I pushed a fix to the master, could you please pull and check if maf_to_gfa1 works? With regards to XMFA conversion, it relies on an external library, so there is no easy way to speed it up right away. Did it finish by now?

devenderarora commented 3 years ago

Dear iminkin, Thanks for the early response. Yesterday, I terminated XMFA conversion as it show nothing in top command. I tried executing again maf_to_gfa1.py but ended up with an error:

(r_env) arora@info4:/680_info4/project/arora/circo/sibeliaz_out$ python maf_to_gfa1.py alignment.maf 1.fasta 2.fasta File "maf_to_gfa1.py", line 174 print "S\t" + str(b + 1) + "\t" + blocks[b][0].body ^ SyntaxError: invalid syntax

####### With my limited understanding to python I tried to fix:

print ("S\t" + str(b + 1) + "\t" + blocks[b][0].bod) and followed the same for all print command and ended up with

Traceback (most recent call last): File "maf_to_gfa1.py", line 7, in from Bio.Alphabet import IUPAC File "/home/arora/anaconda2/envs/r_env/lib/python3.7/site-packages/Bio/Alphabet/init.py", line 21, in "Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information." ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.

####### Following, I just silenced the library Bio.Alphabet at line 7 by # and rerun the program and ended up with

(r_env) arora@info4:/680_info4/project/arora/circo/sibeliaz_out$ python maf_to_gfa1.py alignment.maf 1.fasta 2.fasta and ended up with the error:

Traceback (most recent call last): File "maf_to_gfa1.py", line 202, in blocks, sequence = split_maf_blocks(args.maf) File "maf_to_gfa1.py", line 110, in split_maf_blocks prev_profile = profile(maf, 0) File "maf_to_gfa1.py", line 53, in profile return [group[i].body[column] == '-' for i in xrange(len(group))] NameError: name 'xrange' is not defined ####### Post update I just changed xrange with range function and it seems working.

Please share if the library Bio.Alphabet need to be there or is this the right fix. PS: The maf_to_gfa script still running.

Thanks Regards Devender Arora

devenderarora commented 3 years ago

Dear iminkin, I am able to make .gfa file after successfully running mfa_to_gfa1.py fasta1 fasta2 > output.gfa I found no support in vg to open .gfa file and tried open the respective gfa file in Bandage viewer for gfa file but it ended up with segmentation fault. Please share how can we visualize the .gfa file I will be grateful to you.

iminkin commented 3 years ago

@devenderarora , what is the size of your dataset? Honestly, visualizing a maf alignment as a GFA graph is unlikely to work for any relatively large dataset; even for smaller ones, the resulting graph is quite large and clumsy.

devenderarora commented 3 years ago

~2.7gb. Graphical representation is what I am looking for to better understand and explain result output. Thankyou again

iminkin commented 3 years ago

Sorry, but I suspect that it could be quite challenging to visualize a dataset of such size. Maybe there are solutions out there, but I don't think I can help.

devenderarora commented 3 years ago

No problem friend. I will look for some way out. Thankyou.