mattb112885 / clusterDbAnalysis

ITEP - Integrated Toolkit for Exploration of microbial Pan-genomes
26 stars 15 forks source link

Apparent Issue with rerunning mcl #71

Closed jbird9 closed 8 years ago

jbird9 commented 8 years ago

Hi matt et al,

I am attempting to use this program to examine protein clusters between distantly related organisms within the same deeply-branching taxa. I have an 6 genomes from the group and one genome that I believe will serve as an outgroup while being somewhat closely related. I added the outgroup genome after doing the a first run of setup_step1/2 with various parameters. I am using a universal gene rpl14 as a marker to see how the different parameters behave.

Initially, I found a setting that separated out the more closely related genomes within the 6 from their more distant relatives. Say at granularity 6.0 cutoff 0.4. However, I found that after including the outgroup genome a setting of granularity 6.0 cutoff 0.5 produced no separation at all among any of the genomes. When I rerun values I have previously run with out the outgroup none of the outputs changes from there pre-outgroup state.

I believe either the clusters are not being written over (even though the output to screen says it is) or the values from the repeated runs are not being properly called.

I will probably delete everything and start over as a workaround, but I see this as a hindrance to being able to add new genomes to previous analyses.

Thanks,

Jordan

mattb112885 commented 8 years ago

Hello,

Did you follow the directions here to add your outgroup organism?

https://github.com/mattb112885/clusterDbAnalysis/wiki/Adding-and-removing-genomes-from-existing-itep-databases

You do need to remove the old clustering results, then rererun setup steps 1 and 2. You shouldnt need to delete everything, only the things in the clusters and flatclusters directories.

Best

Matt

mattb112885 commented 8 years ago

As for the granularity I think that may be a separate issue. What do you mean by separation of the genomes, do you mean the clusters all contained a member from every genome?

jbird9 commented 8 years ago

Thanks for the quick response. And your response was helpful. I am attempting to find parameters find protein clusters which are common across my 6 genomes. Using the suggested g 2.0 c 0.4 for I got mostly just universal genes common in all 6. I was looking to relax the parameters a bit so it includes the more distantly related members of the group about not so much that includes an outgroup.

mattb112885 commented 8 years ago

Jordan,

Thank you for the clarification. If you are trying to get more genes with possible membership in all of your organisms, I would suggest trying to run a clustering with a lower cutoff value (say, 0.3 instead of 0.4). Decreasing the cutoff causes more edges between related genes to be maintained on the graph, as the homology does not have to be as strong to consider a relationship to be present.

Best

Matt