Closed valery-shap closed 2 years ago
Thanks for catching this one!
I've just pushed a fix to GitHub, so if you install Trycycler from there, it should work. If you want to hack a solution in place without reinstalling, you can replace your dotplot.py
file with the one from GitHub:
wget https://github.com/rrwick/Trycycler/raw/main/trycycler/dotplot.py
mv dotplot.py /home/miniconda3/envs/trycycler/lib/python3.9/site-packages/trycycler/dotplot.py
However, the fact that you've encountered this bug means Trycycler failed to load a font on your computer and had to fall back to the ugly default one (relevant code in this function).
So while my fix should prevent the crash, the text won't look very nice in your dotplot. I'm curious, what operating system are you using?
I've just added a few more fonts (f17018b35603cf0ed766357552e9c49accd687c7) that a Google search suggested are common in Linux distributions, so hopefully that will help avoid falling back to the ugly default font.
Hello,
Thank you for such quick response! I changed the file and everything works.
NAME="CentOS Linux" VERSION="8 (Core)"
Could you please give any advice about reconciling step and working with copies of cassettes? It is clear that D_contig_2 should be removed in your example because of all other contigs are different. But: if there're 21 contigs in the cluster and 2 contigs have no cassette -- seems that it is trash, there is the cassette on the other contig 2 have 1 cassette 6 - 2 cassettes 10 - 3 cassettes 2 - 4 cassettes If I run reconciling step within contigs with 2 cassetes, everything is fine. When I run this step within contigs with 3 cassettes, there are no errors too. But if I tried to add some contig with different number of cassetes there is always some error. as diff Len, diff identity and etc. So I suppose that I need to decide how many copies should have my sample and run this step with contigs from one group. And It's a problem) cassette is (gene1 - gene2 - gene3 - gene4) 2 cassettes = (gene1 - gene2 - gene3 - gene4) + (gene1 - gene2 - gene3 - gene4) Now I only came up with searching for the raw long reads with this cassette. And I've found reads with cassette x3. Could it be the proof of choosing contigs from group with 3 cassettes? or are there some better methods for checking this?
A lot of thanks, Valery
Hi Valery,
What you did makes sense to me: look for individual long reads that span the cassettes entirely and see how many cassette copies they indicate. It seems like cassette x3 is the answer, and that is also the most popular version in your cluster (10/21 contigs). So I'd remove all contigs that have any other cassette count and reconcile with just those 10 contigs.
The repetitive nature of the cassettes probably confused the assemblers, leading to misassemblies. However, one other possibility is that the number of cassettes is variable - i.e. sometimes the cassettes actually comes in 1, 2, 3 or 4 copies. That's a tougher situation because there isn't really a single answer, so you'd just have to pick one (preferably the most common, e.g. 3x) and run with it.
Ryan
Hello,
Thank you for the great new tool! I have the problem with one cluster and decided to run
trycycler dotplot --cluster_dir trycycler/cluster_003
but I have the error:How could I fix it? version of Trycycler 0.5.1
Valery