rrwick / Trycycler

A tool for generating consensus long-read assemblies for bacterial genomes
GNU General Public License v3.0
306 stars 28 forks source link

ValueError: not enough values to unpack (expected 4, got 3) #23

Closed valery-shap closed 2 years ago

valery-shap commented 2 years ago

Hello,

Thank you for the great new tool! I have the problem with one cluster and decided to run trycycler dotplot --cluster_dir trycycler/cluster_003 but I have the error:

Traceback (most recent call last):
  File "/home/miniconda3/envs/trycycler/bin/trycycler", line 10, in <module>
    sys.exit(main())
  File "/home/miniconda3/envs/trycycler/lib/python3.9/site-packages/trycycler/__main__.py", line 45, in main
    dotplot(args)
  File "/home/miniconda3/envs/trycycler/lib/python3.9/site-packages/trycycler/dotplot.py", line 43, in dotplot
    image = create_dotplots(seq_names, seqs, args)
  File "/home/miniconda3/envs/trycycler/lib/python3.9/site-packages/trycycler/dotplot.py", line 90, in create_dotplots
    draw_labels(image, seq_names, start_positions, end_positions, text_gap, outline_width,
  File "/home/miniconda3/envs/trycycler/lib/python3.9/site-packages/trycycler/dotplot.py", line 173, in draw_labels
    font, text_width, text_height, font_size = \
ValueError: not enough values to unpack (expected 4, got 3)

How could I fix it? version of Trycycler 0.5.1

Valery

rrwick commented 2 years ago

Thanks for catching this one!

I've just pushed a fix to GitHub, so if you install Trycycler from there, it should work. If you want to hack a solution in place without reinstalling, you can replace your dotplot.py file with the one from GitHub:

wget https://github.com/rrwick/Trycycler/raw/main/trycycler/dotplot.py
mv dotplot.py /home/miniconda3/envs/trycycler/lib/python3.9/site-packages/trycycler/dotplot.py

However, the fact that you've encountered this bug means Trycycler failed to load a font on your computer and had to fall back to the ugly default one (relevant code in this function).

So while my fix should prevent the crash, the text won't look very nice in your dotplot. I'm curious, what operating system are you using?

rrwick commented 2 years ago

I've just added a few more fonts (f17018b35603cf0ed766357552e9c49accd687c7) that a Google search suggested are common in Linux distributions, so hopefully that will help avoid falling back to the ugly default font.

valery-shap commented 2 years ago

Hello,

Thank you for such quick response! I changed the file and everything works.

NAME="CentOS Linux" VERSION="8 (Core)"

Could you please give any advice about reconciling step and working with copies of cassettes? It is clear that D_contig_2 should be removed in your example because of all other contigs are different. But: if there're 21 contigs in the cluster and 2 contigs have no cassette -- seems that it is trash, there is the cassette on the other contig 2 have 1 cassette 6 - 2 cassettes 10 - 3 cassettes 2 - 4 cassettes If I run reconciling step within contigs with 2 cassetes, everything is fine. When I run this step within contigs with 3 cassettes, there are no errors too. But if I tried to add some contig with different number of cassetes there is always some error. as diff Len, diff identity and etc. So I suppose that I need to decide how many copies should have my sample and run this step with contigs from one group. And It's a problem) cassette is (gene1 - gene2 - gene3 - gene4) 2 cassettes = (gene1 - gene2 - gene3 - gene4) + (gene1 - gene2 - gene3 - gene4) Now I only came up with searching for the raw long reads with this cassette. And I've found reads with cassette x3. Could it be the proof of choosing contigs from group with 3 cassettes? or are there some better methods for checking this?

A lot of thanks, Valery

rrwick commented 2 years ago

Hi Valery,

What you did makes sense to me: look for individual long reads that span the cassettes entirely and see how many cassette copies they indicate. It seems like cassette x3 is the answer, and that is also the most popular version in your cluster (10/21 contigs). So I'd remove all contigs that have any other cassette count and reconcile with just those 10 contigs.

The repetitive nature of the cassettes probably confused the assemblers, leading to misassemblies. However, one other possibility is that the number of cassettes is variable - i.e. sometimes the cassettes actually comes in 1, 2, 3 or 4 copies. That's a tougher situation because there isn't really a single answer, so you'd just have to pick one (preferably the most common, e.g. 3x) and run with it.

Ryan