mikolmogorov / Flye

De novo assembler for single molecule sequencing reads using repeat graphs
Other
784 stars 168 forks source link

Question about circular sequence output #131

Closed russellj7 closed 5 years ago

russellj7 commented 5 years ago

Hello,

First of all, thank you for creating Flye! It has been very useful for bacterial genome assembly.

In the assembly_info.txt output, there is a column for denoting circular contigs under "circ.". Just to confirm, "+" denotes a circular sequence? Also, if the sequence is denoted as circular, is the sequence already circularized in the Flye output assembly.fasta or would circularization need to take place after assembly?

Thanks!

mikolmogorov commented 5 years ago

Hi,

Thank you for the feedback.

Yes, "+" corresponds to a cricular sequence. You can also visualize the assembly graph with Bandage / AGB to have an overview of the assembly.

The sequence should be circularized already. Some people reported that Flye might delete a few (~10) bases at the contig's breakpoint. We are planning to fix this in the future. See the Ryan Wick's evaluation for the details (https://github.com/rrwick/Long-read-assembler-comparison).

StevenJRobbins commented 1 year ago

Thank you for this thread! I was hoping I could dig a little bit deeper. Is circulation determined simply by finding that the path in the assembly graph from the beginning of a contig ends at the end of the same contig? This seems intuitive but I wanted to confirm.

And when you say that the sequence was "circularized", what does this mean? Is the sequence trimmed to remove overlap?

Thank you!

mikolmogorov commented 1 year ago

Yes to both!

Fatma116 commented 10 months ago

Hello, what does it mean if chromosomal contig is denoted non-circular? Is it because there is a gap at the end between the last and first base, any possible overlap at the ends was not trimmed, or it is just simply the nature of the chromosome that it is linear? Thanks

mikolmogorov commented 10 months ago

@Fatma116 it may mean all of these. If you visualize assembly graph, you may get a better idea wha is causing this.

Fatma116 commented 10 months ago

@fenderglass Thanks a lot for your reply. I looked at the assembly graph (largest contig )but I am confused about how to determine what is the reason, but I can see that it is not completely closed as in edges 8, 9, 10, 11, and 12. I am attaching the assembly graph as well as the assembly info. I would appreciate it if you can have a look. assembly_graph assembly_info.txt Thanks a lot in advance

mikolmogorov commented 10 months ago

@Fatma116 it does look like a circular chromosome, and there is one unresolved region (represented by edge_3 and edge_3). This may mean that either you have a long tandem duplication in this region. Another possibility is that you may have heterogeneity in your population, with some clones having an expanded repeat. In this case, that creates ambiguity. If you could send me flye.log file, I may be able to distinguish these to cases.

Fatma116 commented 10 months ago

Dear Mikhail Attached you can find the flye log file Thanks in advance Best

Fatma Mahmoud PhD Student Research Unit of Comparative Microbiome (COMI) Helmholtz Munich


From: Mikhail Kolmogorov @.> Sent: Tuesday, December 19, 2023 2:44 PM To: fenderglass/Flye @.> Cc: Fatma Mahmoud @.>; Mention @.> Subject: Re: [fenderglass/Flye] Question about circular sequence output (#131)

You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification

@Fatma116https://github.com/Fatma116 it does look like a circular chromosome, and there is one unresolved region (represented by edge_3 and edge_3). This may mean that either you have a long tandem duplication in this region. Another possibility is that you may have heterogeneity in your population, with some clones having an expanded repeat. In this case, that creates ambiguity. If you could send me flye.log file, I may be able to distinguish these to cases.

— Reply to this email directly, view it on GitHubhttps://github.com/fenderglass/Flye/issues/131#issuecomment-1862785511, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BBSMOTSJHCUU7JUUHXDMIN3YKGK3ZAVCNFSM4H36HXXKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBWGI3TQNJVGEYQ. You are receiving this because you were mentioned.

Helmholtz Zentrum München – Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) Ingolstädter Landstraße 1, D-85764 Neuherberg, https://www.helmholtz-munich.dehttps://www.helmholtz-munich.de/ Geschäftsführung: Prof. Dr. med. Dr. h.c. Matthias H. Tschöp, Prof. Dr. Dr. h.c. mult. Martin Hrabe de Angelis (komm.) | Aufsichtsratsvorsitzende: MinDir’in Prof. Dr. Veronika von Messling Registergericht: Amtsgericht München HRB 6466 | USt-IdNr. DE 129521671