Open mepcotterell opened 9 years ago
That makes sense.
I agree with you in the sense that we should keep the essential instances such as Variable Type and Functions etc. with the main analytics.owl. We need those in order to perform any classification. We may move all the examples (Generic Models, associated variables etc.) to analytics-examples.owl.
On Tue, Mar 3, 2015 at 12:50 PM, Michael Cotterell <notifications@github.com
wrote:
Assigned #8 https://github.com/scalation/analytics/issues/8 to @mvnural https://github.com/mvnural.
— Reply to this email directly or view it on GitHub https://github.com/scalation/analytics/issues/8#event-245386064.
Mustafa Veysi Nural PhD Candidate Department of Computer Science University of Georgia, Athens http://www.mendeley.com/profiles/mustafa-nural/
@mvnural, I've set this issue to accepted. However, before we implement as explained in my original post, I'd like to suggest the following to make things even easier. We should should have an ontology for each example dataset. For each dataset, we could have multiple model examples. My reasoning for having multiple ontologies for the examples is that it might get confusing if variable names are shared between datasets. What do you think?
I'm not sure. It might be both Yes and No.
It's yes because the only reason we have it this way is for convenience. We simply reused the variables to create multiple models. Normally, this would be part of feature selection step.
It's no because think of the case where we have field headers of a dataset aligned with a domain ontology. Then they would simply be shared across datasets of the same domain. This is not something we should worry about right now but just food for thought.
On Tue, Mar 3, 2015 at 8:22 PM, Michael Cotterell notifications@github.com wrote:
@mvnural https://github.com/mvnural, I've set this issue to accepted. However, before we implement as explained in my original post, I'd like to suggest the following to make things even easier. We should should have an ontology for each example dataset. For each dataset, we could have multiple model examples. My reasoning for having multiple ontologies for the examples is that it might get confusing if variable names are shared between datasets. What do you think?
— Reply to this email directly or view it on GitHub https://github.com/scalation/analytics/issues/8#issuecomment-77079111.
Mustafa Veysi Nural PhD Candidate Department of Computer Science University of Georgia, Athens http://www.mendeley.com/profiles/mustafa-nural/
I thought more about this and decided that having an ontology for each dataset should be fine. Even if same variable is used across datasets (e.g., mpg), they are only conceptually related but different entities. So please ignore my previous concern.
I think it would be a good idea for us to have analytics.owl not contain any individuals from the examples. Instead, create analytics-examples.owl which imports analytics.owl and include the examples there. This way, the size of the main ontology is minimized, which means our code for classifying should run faster.
Of course, we find that some individuals are always needed (as might be the case with some of the functions or distributions), then we can of course include them analytics.owl.