rhysnewell / aviary

A hybrid assembly and MAG recovery pipeline (and more!)
GNU General Public License v3.0
84 stars 12 forks source link

Contig size parameter not working properly #211

Open dylancronin opened 3 months ago

dylancronin commented 3 months ago

Hey Rhys,

Hope all is well.

It seems that when running Aviary on some samples I noticed in the final bins output that there were many bins with contigs much smaller than my set size. Narrowing down the problem a bit, looking at some of the output aviary has produced, this appears to be a problem only with metabat1 (as far as I can tell). I am getting contigs consistently as small as 1kb, whereas the other tools seemed to set that rule appropriately. My guess is that this is not something you can resolve (I haven't looked into the metabat v1 code myself), but a quick fix would just be to add a step in this process that filters out all contigs smaller than the set size prior to running any of the initial binners/read mapping. That doesn't fix the issue with metabat v1, but it would at least prevent the problem from propagating out from your tool.

-Dylan-

wwood commented 1 week ago

A bit late here, but do you have any sense about whether removing shorter contigs before read mapping for binning changes the results in real world cases?

Removing short contigs before mapping would also have the effect of speeding up the mapping, and making aviary faster. So would ideally be something to do if results don't change.