stajichlab / AAFTF

Automatic Assembly For The Fungi
MIT License
19 stars 4 forks source link

use bbduk.sh (BBMap) for filter step instead of bwa/bowtie matching reads #4

Closed hyphaltip closed 5 years ago

hyphaltip commented 5 years ago

use kmer based matching of reads with bbduk.sh which can also run phiX and primer filtering.

It is possible this could replace trim trimmomatic step as well? @nextgenusfs

nextgenusfs commented 5 years ago

Yes, actually could replace nearly all steps with bbmap.... would reduce number of dependencies. But could leave in code so that can use several aligners for example. But I've been using bbduk.sh for adapter trimming first, then phiX removal. https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/

hyphaltip commented 5 years ago

yes I think this might be best thing to do that -- are you able to share any best practices out of that? I've got something started but maybe you can help tweak. Mainly would not do trim/filter separately.

hyphaltip commented 5 years ago

trying to think about whether this is generalizeable - in that there are a lot of options do we really want to enable pass through of those (eg a general "here's the string to pass to bbduk.sh?")

nextgenusfs commented 5 years ago

I more/less used what Brian has listed as default settings -- I tend to not quality trim prior to assembly, but removal of adapters/phix required.

We could have an option to pass directly to any of these tools, but for standard illumina adapter trimming/phix removal, I can't think of a use case where I've modified anything other than minimum length to keep for example.

hyphaltip commented 5 years ago

I do need to do custom contaminant filtering - that's my heavy use case for this where we seem to have bacteria and it would be helpful to remove as much as possible before assembly instead of the megablast at the end I think (though some circularity needed).

hyphaltip commented 5 years ago

We now support BBMap (bbduk.sh) as default for screening for contaminants.