Open swarnalilouha opened 1 year ago
Often time assemblies have multiple variants. The simplest case are SNPs. SAUTE arranges all variants in a graph (output is controlled by --gfa option). You can analyze this graph if you install BANDAGE (https://rrwick.github.io/Bandage/). Up to 1000 variants are printed by SAUTE in --all_variants in the fasta format. The first part of the fasta ID is Target name:graph number:contig number:estimated k-mer count. After that the numbers of the used graph nodes are printed separated by a space. From what you posted one can say that your graph has two variants. The difference is represented by nodes 3 and 4. You should either look at the graph or align two contigs to understand what kind of difference they have.
Can SAUTE be used to assemble whole genome sequencing (WGS) data for humans?
SAUTE was designed for assembling bacterial genes. It is not appropriate for assembling the human genome.
From: atongsa @.> Sent: Saturday, January 27, 2024 6:32 AM To: ncbi/SKESA @.> Cc: Souvorov, Alexander (NIH/NLM/NCBI) [E] @.>; Comment @.> Subject: [EXTERNAL] Re: [ncbi/SKESA] Help needed in understanding Saute output (Issue #40)
Can SAUTE be used to assemble whole genome sequencing (WGS) data for humans?
- Reply to this email directly, view it on GitHubhttps://github.com/ncbi/SKESA/issues/40#issuecomment-1913125794, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGIEUFRKGMSTYC74O2H4X2DYQTQS7AVCNFSM6AAAAAA2LUBT3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTGEZDKNZZGQ. You are receiving this because you commented.Message ID: @.**@.>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.
yes, i not mean the whole genome, but only specific genes in human genome with SAUTE using human WGS
Try the target sequences slightly exceeding the area of the gene of interest. It should work, unless there are large insertions/deletions/rearrangements inside the gene introns.
thank you very much
I used Saute to assemble a reference fasta sequence '>CRYPT1020_1' from Illumina reads. I got 2 assemblies:
Why are there 2 assemblies and what do the numbers in the fasta headers mean?