ryanmelnyk / PyParanoid

Rapid and scalable homolog identification for bacterial genomes
MIT License
32 stars 7 forks source link

orthologous operon search #1

Closed brooksomics closed 6 years ago

brooksomics commented 6 years ago

Hi Ryan. Transferring our conversation here in case others had the same question.

To clarify my question. Say I want to visualize an operon, and to make up an example, let's say luxABC across all Pseudomonads species.

Here you describe how to do this with one gene, luxI. You say ### There should be only one sequence in the FASTA file., so what's the best approach if you'd like to do this for 3 genes (e.g. gplot.match_seqs("../src/luxABC.faa","../../data/Pseudo/")? I'm expecting these genes would be co-located in an operon architecture, but I can add checks for this later. I guess my question is, could you have 3 genes in your input.faa? If not, any workflow you could recommend within your PyParanoid tool? Currently what I'm guessing I should do is conduct each search separately, luxA, luxB, and luxC, and then combine results and visualize.

It would be cool if you could do something like this:

t = gplot.add_group_to_tree(c("group_22008","group_10893","group_16731"),"../src/BrassClade.tre","../../data/Pseudo")

Anyways, this is a really open-ended question. I mainly wanted to say this tool looks interesting and I'm excited to play around with it. Nice work!

ryanmelnyk commented 6 years ago

Hi Bubba (I'm assuming Bubba is ok in this context since you're wearing the hot dog costume).

Great question - it's a really common analysis that I do really frequently. Short answer is that I use this standalone script to make a heatmap from a given tree and then smush that together with the tree in illustrator to make this figure AHLsynthase_groups_MODIFIED.pdf

I'm planning on making that script into a method eventually in a future PyParanoid update for interactive use within the notebook for visualization. And I need somebody who is better with matplotlib to figure out how to nicely plot a tree next to the heatmap in a script rather than manually....

brooksomics commented 6 years ago

Ah, I see. Unfortunately I'm more of a Rcoder, but this may be a fun project to help me grow my python chops. I'm going to ping some folks in the Banfield lab about PyParanoid, since I think many would put it to good use. Several python coders in the lab now too, so maybe they'll help do some developing. Thanks again!

ryanmelnyk commented 6 years ago

genomeplot.match_seqs() has been updated to run on a multi-fasta file and the HomologPresenceTree notebook has been updated as well.