tseemann / barrnap

:microscope: :leo: Bacterial ribosomal RNA predictor
GNU General Public License v3.0
210 stars 41 forks source link

Question about using barrnap on reads #40

Closed jolespin closed 4 years ago

jolespin commented 4 years ago

I've quality trimmed my HISEQ reads and converted to fasta. I want to run these through barrnap but I'm not getting any hits with default settings. I was wondering how I could adjust these parameters to properly utilize barrnap while casting a wide net.

--evalue is the cut-off for nhmmer reporting, before further scrutiny
--lencutoff is the proportion of the full length that qualifies as partial match
--reject will not include hits below this proportion of the expected length

Is lencutoff the proportion the target rRNA gene that is covered by the query sequence? If so, should I drop this down to something like 0.01?

I'm confused on how reject is different than lencutoff. How would you adjust this for properly incorporating reads?

For evalue I was going to drop it down to 0.1 to cast a wide net. Do you think this is too permissive?

My sequences are around 200 bp long.

tseemann commented 4 years ago

I don't think this is the right tool if you want to scan reads directly.

i think --reject means it won't be in the GFF file output, and --lencutoff is to still keep it but label it as partial. You would need to bring those down dramatically yes. 16S is ~1542 bp long, so 0.1 might work. Or just set them to zero (0).

You could just take the HMM Model files from the db folder and run hmmer yourself? Maybe even bwa/minimap2 first to bait all the reads against a DB of rRNA genes, then assemble those, or scan those directly.

jolespin commented 4 years ago

Thanks for the suggestions here! I realized that I was initially supposed to identify ribosomal proteins instead of rRNA. Had some trouble using phylosift with some weird dependency issues so I thought this would have been a good alternative but they do two different things. In the future, I'll definitely continue to use barrnap for identifying rRNA sequences.

tseemann commented 4 years ago

Ah ... yes there are 20-30 ribosomal proteins. They are quite conserved and easy to find in assemblies. Good luck.

jolespin commented 1 year ago

@tseemann what would you recommend as "relaxed" and "strict" settings in pulling out rRNA from metagenome-assembled genomes (MAG)?