A -p meta to the contig start search to handle small circular contigs.

LeeBergstrand commented 1 month ago

When running rotary on an expanded dataset I am getting the following error in the rule search_contig_start

  GNU nano 2.9.3                                                                                                     STR27d/logs/circularize/search_contig_start.log                                                                                                                

### Predict genes ###
-------------------------------------
PRODIGAL v2.6.3 [February, 2016]
Univ of Tenn / Oak Ridge National Lab
Doug Hyatt, Loren Hauser, et al.
-------------------------------------
Request:  Single Genome, Phase:  Training
Reading in the sequence(s) to train...

Error:  Sequence must be 20000 characters (only 6580 read).
(Consider running with the -p meta option or finding more contigs from the same genome.)

### Find HMM hits ###

Error: Sequence file STR27d/circularize/identify/STR27d_circular.faa is empty or misformatted

### Done. ###

This error happens when a very small contig is predicted to be circular, but the larger chromosomes are too fragmented to be circularized. Perhaps these tiny circular elements are plasmids <20K bp?.

Proposed Solution:

As the error suggests, we run prodigal in meta mode using the -p meta flag.

https://github.com/hyattpd/prodigal/wiki/Gene-Prediction-Modes

@jmtsuji Would there be any disadvantages to running prodigal in meta mode for this use case?

LeeBergstrand commented 1 month ago

@jmtsuji Testing the meta flag now.

LeeBergstrand commented 1 month ago

May be of useful review as well: https://github.com/hyattpd/prodigal/wiki/Advice-by-Input-Type

rotary-genomics / rotary

A -p meta to the contig start search to handle small circular contigs. #205