error while trying to plot an assembly and a reference

schneebergerlab / plotsr

Tool to plot synteny and structural rearrangements between genomes

MIT License

288 stars 28 forks source link

error while trying to plot an assembly and a reference #24

Closed josruirod closed 1 year ago

josruirod commented 2 years ago

Hi, and thanks for the software, it looks great!

I'm trying to compare a de novo polished genome assembly and a reference genome. I think I've gone through all the steps, but plotsr is failing for me at the end with the error "ValueError: not enough values to unpack (expected 2, got 1)"

Do you have any recommendation? I've manually checked that the two fasta files contain contigs called the same, so they are comparable

Hope you can provide any support, because I'm interested in using this in our pipelines

Best

mnshgl0110 commented 2 years ago

Hi. Could you please share the complete error message? - Manish

josruirod commented 2 years ago

Hi, thanks for the help

Sure, I'm sorry I did not do it beforehand. So I'm not seeing anymore that error, but the following one:

Traceback (most recent call last): File "path/plotsr", line 6, in main(sys.argv[1:]) File "path/plotsr/main.py", line 55, in main plotsr(args) File "path/plotsr/plotsr.py", line 152, in plotsr chrlengths, genomes = validalign2fasta(alignments, args.genomes.name) File "path/plotsr/func.py", line 901, in validalign2fasta raise ImportError(errmess2.format(c, os.path.basename(genf), als[i][0])) ImportError: For chromosome ID: XXX_API_v3, length in genome fasta: genomes.txt is less than the maximum coordinate in the structural annotation file: plotsr_syri.out. Exiting.

Must the contigs/chromosomes being compared be of equal size? That's indeed not the case with a de novo assembly... I ran: plotsr --sr plotsr_syri.out --genomes genomes.txt -o plotsr_assembly_reference_plot.pdf The files can be found here

Best

mnshgl0110 commented 2 years ago

It seems that you have the genomes in wrong order. From the README:

It is required that the order of the genomes is the same as the order in which genomes are compared. For example, if the first genome annotation file uses A as a reference and B as query, and the second genome annotation file uses B as a reference and C as query, then the genomes.txt file should list the genomes in the order A, B, C.

Please retry with the following as the genomes.txt.

#file   name    tags
ref_reduced_comp_assembly.fasta ref lw:1.5
assembly_ILRA_reduced_comp_ref.fasta    novo    lw:1.5

josruirod commented 2 years ago

Silly me, you are absolutely right I did not that. With your suggested genomes.txt is working fine. Thanks for the help, if I may add, I was also at the end getting the error message:

raise ImportError("Incomplete genomic information.\nExpected format for the genome file:\npath_to_genome1\tgenome1_id\ttags\npath_to_genome2\tgenome2_id\ttags\n\nMake sure that the columns are separated by tabs (and not spaces).")

I would say the \n at the end and at the beginning suggested in that error message are not correct? Also the header seems to be required? Anyway, this worked for me. echo -e "#file\tname\ttags\n$PWD/ref_reduced_comp_assembly.fasta\treference\tlw:1.5\n$PWD/assembly_ILRA_reduced_comp_ref.fasta\tassembly\tlw:1.5" > genomes.txt Thanks!!