wilkelab / Opfi

A Python package for discovery, annotation, and analysis of gene clusters in genomics or metagenomics data sets.
https://opfi.readthedocs.io/
MIT License
21 stars 5 forks source link

Allow some rules to use regular expressions #93

Closed jimrybarski closed 4 years ago

jimrybarski commented 4 years ago

Some rules and filters will now accept regular expressions for feature names. I don't think it's worth implementing this for the rules that accept lists of sets of genes - it would just be confusing. That said, it's possibly worth implementing the same functionality in a separate method that only takes regexes for each set instead of explicit lists, but I'm going to consider that out of scope until I find myself wanting that in practice.

The max_distance rule also takes a new parameter closest_pair_only, which if True, will only look at the distance between the closest pair. This makes it possible to impose rules like "a transposase must be within 30 bp of any cas gene".

This also fixes a bug where same_orientation was not actually excluding the exceptions.

Resolves #91.