Closed ashleyp1 closed 1 year ago
It looks like you have two groups of contigs:
A_Utg620
, E_Utg606
, H_Utg612
, I_utg000001c
and J_Utg602
B_contig_4
, C_utg000001l
, D_utg000001l
, F_contig_2
, G_contig_2
, K_contig_1
I suspect that you have some sort of structural rearrangement going on here, e.g. a large inversion of some sequence. Perhaps there is heterogeneity in your sample, i.e. a mix of two different large-scale structures, and assemblers are settling on either one or the other. Since Trycycler does a global alignment, a big structural difference can lead to very low identities.
You could confirm this by looking at the dotplots, nucmer alignments, or Mauve alignments. If it does look like a structural rearrangement, I would pick one (perhaps arbitrarily) and then delete the others. For example, just use the group 2 contigs.
Cases with heterogeneity are some of the trickier scenarios when doing a Trycycler assembly!
Ryan
I'm trying to reconcile my contigs and ran into a problem at the pairwise identities check where a large proportion of my contigs have values of 59%. I'm hesitant to fix it by removing them since that would require me to remove at least 5 contigs, including all of my raven assemblies, but I'm also unsure if just lowering the min_identity threshold is right either.
I mapped the contigs against each other using nucmer (making graphs similar to the dotplot function but it runs a bit faster) and there doesn't appear to be that big of differences. Do you have any recommendations on how to move forward with this?