wilkelab / Opfi

A Python package for discovery, annotation, and analysis of gene clusters in genomics or metagenomics data sets.
https://opfi.readthedocs.io/
MIT License
21 stars 5 forks source link

109 allow user to seed a run with coordinates and/or a contig id instead of a bait gene #111

Closed alexismhill3 closed 4 years ago

alexismhill3 commented 4 years ago

Adds an alternative seed step that can be used to specify genomic coordinates (and, optionally, a contig ID) to seed a run with.

This essentially just grabs all ORFs in the region and sets up the pipeline to run as normal, so all other steps (filter, crispr, etc.) are still valid.

Presumably, this feature will mainly be useful for re-annotating systems that were previously identified by the pipeline; the major benefit being that that output is in a consistent format that Operon Analyzer expects.

Closes #109

alexismhill3 commented 4 years ago

Could add_seed_with_coordinates_step additionally take the same parameters as add_blast_step, and then call add_blast_step internally so the user doesn't need to call it manually? Or is there ever a use case for some other step to be added?

Otherwise I tested this on some real data and it worked great.

Sure, that makes sense to me