xfengnefx / hifiasm-meta

hifiasm_meta - de novo metagenome assembler, based on hifiasm, a haplotype-resolved de novo assembler for PacBio Hifi reads.
MIT License
60 stars 8 forks source link

Resistance gene not assembled in primary contig file, but is present in alternate contig file #30

Open sosie100 opened 11 months ago

sosie100 commented 11 months ago

In a comparison of metagenomic assemblies made only from illumina short-read data (metaSPADES) to hifiasm-meta assemblies (the primary contig file, .p_ctg.gfa), we found an ARG assembled in a metaSPADES assembly that was not present in the hifiasm-meta assembly for that sample. ~20 HiFi reads align well to the ARG, but the ARG is not assembled with hifiasm-meta. However, the ARG is assembled in the alternate contig file made by hifiasm-meta (.a_ctg.gfa), along with the .r_ctg.gfa and .p_utg.gfa files. How would you recommend we run hifiasm-meta such that the ARG content we care about lies solely within the primary contig file, or should we find ARGs within both primary and alternate contig files? ARGs are usually surrounded by mobile genetic elements and usually belong to low-abundance species.

xfengnefx commented 11 months ago

In your case, please use both primary and alt contig files, i.e. p_ctg and a_ctg. Actually the "alternative contig" does not have a significant meaning here, it's more like a remnant from forking hifiasm: initially I wanted to put popped bubble edges all into the alt, but then felt it does not matter either way because short contigs are less useful.

As for why ARGs are in the alt, my guess is that there were haplotypes without the ARGs and perhaps with higher coverage, so graph cleaning dropped the ARGs. You can pick a couple of reads from an ARG contig and grep them in the r_utg graph to check.

sosie100 commented 11 months ago

Our samples are fecal sample metagenomes with only haploid bacterial DNA, so haplotypes are not relevant in our case.

We found the ARG sequence within a 554 kb segment in the r_utg.gfa file. Out of that 554 Kb segment there is ~370Kb of its sequence not in the p_ctg.gfa file, and the full 554 kb is in the a_ctg.gfa file. Why is this sequence left out of the .p_ctg.gfa file?

xfengnefx commented 11 months ago

with only haploid bacterial DNA

Sorry for my confusing comment, meant to say "there were haplotypes closely related strains [with and] without the ARGs". For genomes that have less than 1\% whole genome diversity, hifiasm-meta currently usually will not separate them.

a 554 kb segment in the r_utg.gfa file

Is that a unitig? (And it is divided up in the contig graphs?) What are the coverages of the segment and contigs? Coverage is the dp:f field of S lines in the gfa files. Bandage will show it on the right side when you select a node, too.

If you don't mind sharing the {r_utg, p_ctg, a_ctg}.gfa files via email, I can have a look.

sosie100 commented 11 months ago

Thank you for the response! I have emailed you the requested files. The "554 kb segment in the r_utg.gfa file" I am referring to appears to be a unitig as it is identified with "utg" in the beginning. The coverage is dp:f:11. The 554kb segment in the r_utg.gfa does not appear to be divided up in the contig graph based on an end-to-end alignment to an a_ctg segment. More details about coordinates are in the email I sent.

On Wed, Nov 8, 2023 at 3:45 PM xfengnefx @.***> wrote:

with only haploid bacterial DNA

Sorry for my confusing comment, meant to say "there were haplotypes closely related strains [with and] without the ARGs". For genomes that have less than 1% whole genome diversity, hifiasm-meta currently usually will not separate them.

a 554 kb segment in the r_utg.gfa file

Is that a unitig? (And it is divided up in the contig graphs?) What are the coverages of the segment and contigs? Coverage is the dp:f field of S lines in the gfa files. Bandage will show it on the right side when you select a node, too.

If you don't mind sharing the {r_utg, p_ctg, a_ctg}.gfa files via email, I can have a look.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

xfengnefx commented 11 months ago

Got the mail, thanks! Will check tonight.