wilkelab / Opfi

A Python package for discovery, annotation, and analysis of gene clusters in genomics or metagenomics data sets.
https://opfi.readthedocs.io/
MIT License
21 stars 5 forks source link

directory names for clustered systems with many genes can be longer than the system character limit #150

Closed alexismhill3 closed 4 years ago

alexismhill3 commented 4 years ago

In _plot_clustered_operons, we construct a string that contains the name of each gene in the candidate system, and use that as the name for the directory to write clustered operon figures to.

However, since we don't impose a hard limit on the size of an operon object (and it can in fact represent any arbitrary collection of genes, not necessarily those comprising a real operon) in some cases the number of genes can be quite large, resulting in directory names that may be longer than the system character limit (I think it's usually ~250).

I have a working solution for this, but it requires truncating the motif string and appending an ID to the end so that the directory name is unique. I don't think this is very elegant - any suggestions?

clauswilke commented 4 years ago

I think that's a totally reasonable solution.

jimrybarski commented 4 years ago

Since the purpose behind the folders being named after the motif is to reduce the time it takes for the user to look at every candidate system (in one way or another), truncating them effectively throws out that utility. Of course, anything longer than 255 characters is almost certainly garbage.

I think this is a fine solution for now but it highlights that there's probably a better way to get an overview of the data.