Open amitlevon opened 7 years ago
Hi Amit,
Glad you've found my work useful :)
Including all ancestors was actually intentional. For each gene, I wanted to explicitly define its effect on the cell through multiple scales (from low level terms in the ontology to higher level terms). By doing so, a genotype (involving one or more gene can also be represented in a multi-scale way through the ontotype.
Let me know if you have any other questions
Mike
On Tue, May 16, 2017 at 8:19 AM, Amit Levon notifications@github.com wrote:
Hi Michael,
First of all thank you for the useful code! It has helped my friends and I immensely in various biological network research projects.
There is one issue regarding the example data you've uploaded that I wanted to understand. In the 'yeast_gene_2_term' file, I saw there are duplicate entries. Our data has fewer gene to term duplicates (we have only direct parents). Is your data larger because you included, for each gene, all its ancestors? If so, why did you chose to include all ancestors when the file 'yeast_child_2_parent' already covers these connections? And this is necessary in order to create a valid ontology using your code?
Thank you! Amit
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/michaelkyu/ontotype/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/ALeuwWpxiJutG2oEEAetMZKbSZC010K3ks5r6b6PgaJpZM4NcqTc .
Thanks for the fast reply!
I’m curious to know, if the ‘yeast_child_2_parent’ file already includes the ancestors of each gene, what is the significance of representing these same connections in the ‘yeast_gene_2_term’ file as well?
In addition, I have created two files similar to the ones you presented in the example, that include each gene and its ancestors, and child-to-parent relations. The algorithm runs to completion, but outputs a warning message (attached to this message). This does not show when running the algorithm on your example files, so it must be a result of the method I used to produce my data. Could you expand on the meaning of this warning, and possible causes that could generate it? If I may ask, how did you produce the two example files you have uploaded?
Thank you once again for all your help :)
Hi Amit,
The example ontology was actually the one used in the Cell Systems paper. The paper's supplement describes how I took the Gene Ontology and stripped away GO terms that were irrelevant because they didn't have any yeast genes annotated to them.
I believe that warning means that there are 1,060 terms in your "yeast_child2_parent.txt" file that do not have an explicit link to any genes in the file "yeast_gene2_ancestors.txt".
This Ontology constructor assumes that the input files have already propagated upwards all gene annotations, i.e. connected genes to all their ancestors terms. If this is not the case, then you can do force this propagation by running the following code
a.propagate_annotations()
--Mike
On Thu, May 18, 2017 at 1:40 AM, Amit Levon notifications@github.com wrote:
Thanks for the fast reply!
I’m curious to know, if the ‘yeast_child_2_parent’ file already includes the ancestors of each gene, what is the significance of representing these same connections in the ‘yeast_gene_2_term’ file as well?
In addition, I have created two files similar to the ones you presented in the example, that include each gene and its ancestors, and child-to-parent relations. The algorithm runs to completion, but outputs a warning message (attached to this message). This does not show when running the algorithm on your example files, so it must be a result of the method I used to produce my data. Could you expand on the meaning of this warning, and possible causes that could generate it? If I may ask, how did you produce the two example files you have uploaded?
Thank you once again for all your help :)
[image: warning] https://cloud.githubusercontent.com/assets/28732361/26193695/b2508ada-3bbe-11e7-87f5-86a0b28d89c0.jpg
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/michaelkyu/ontotype/issues/1#issuecomment-302339809, or mute the thread https://github.com/notifications/unsubscribe-auth/ALeuwZom3cWfh5YZr5AAIuXbjsB2Gf9Mks5r7AP5gaJpZM4NcqTc .
Hi Michael,
I read the Cell System paper and had an elementary question regarding the implementation. In the paper and in your example you are testing double knockouts - is there a limitation on the amount of knockouts? from looking at the code I can't point to this restriction and the results I get from running quartets and fives seems to be reasonable.
As Amit mentioned above, thank you once again for sharing your code, it has been very helpful! Danielle
Hi Danielle,
Sorry for the late reply. The paper mostly tested double knockouts, though the last figure of the paper was about a small set of triple knockouts. The code can handle a genotype with an arbitrary number of knockouts (four, five, as many as needed).
Best, Mike
On Sun, Aug 27, 2017 at 2:12 PM, daniellemiller notifications@github.com wrote:
Hi Michael,
I read the Cell System paper and had an elementary question regarding the implementation. In the paper and in your example you are testing double knockouts - is there a limitation on the amount of knockouts? from looking at the code I can't point to this restriction and the results I get from running quartets and fives seems to be reasonable.
As Amit mentioned above, thank you once again for sharing your code, it has been very helpful! Danielle
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/michaelkyu/ontotype/issues/1#issuecomment-325225102, or mute the thread https://github.com/notifications/unsubscribe-auth/ALeuwQgynWUqOSIhz294KsQmI_pLGFdqks5scduugaJpZM4NcqTc .
Great, thank you for your reply!
Hi Michael,
First of all thank you for the useful code! It has helped my friends and I immensely in various biological network research projects.
There is one issue regarding the example data you've uploaded that I wanted to understand. In the 'yeast_gene_2_term' file, I saw there are duplicate entries. Our data has fewer gene to term duplicates (we have only direct parents). Is your data larger because you included, for each gene, all its ancestors? If so, why did you chose to include all ancestors when the file 'yeast_child_2_parent' already covers these connections? And this is necessary in order to create a valid ontology using your code?
Thank you! Amit