scalation / analytics

ScalaTion Analytics Ontology
MIT License
7 stars 1 forks source link

Separate Ontology for Examples #8

Open mepcotterell opened 9 years ago

mepcotterell commented 9 years ago

I think it would be a good idea for us to have analytics.owl not contain any individuals from the examples. Instead, create analytics-examples.owl which imports analytics.owl and include the examples there. This way, the size of the main ontology is minimized, which means our code for classifying should run faster.

Of course, we find that some individuals are always needed (as might be the case with some of the functions or distributions), then we can of course include them analytics.owl.

mvnural commented 9 years ago

That makes sense.

I agree with you in the sense that we should keep the essential instances such as Variable Type and Functions etc. with the main analytics.owl. We need those in order to perform any classification. We may move all the examples (Generic Models, associated variables etc.) to analytics-examples.owl.

On Tue, Mar 3, 2015 at 12:50 PM, Michael Cotterell <notifications@github.com

wrote:

Assigned #8 https://github.com/scalation/analytics/issues/8 to @mvnural https://github.com/mvnural.

— Reply to this email directly or view it on GitHub https://github.com/scalation/analytics/issues/8#event-245386064.

Mustafa Veysi Nural PhD Candidate Department of Computer Science University of Georgia, Athens http://www.mendeley.com/profiles/mustafa-nural/

mepcotterell commented 9 years ago

@mvnural, I've set this issue to accepted. However, before we implement as explained in my original post, I'd like to suggest the following to make things even easier. We should should have an ontology for each example dataset. For each dataset, we could have multiple model examples. My reasoning for having multiple ontologies for the examples is that it might get confusing if variable names are shared between datasets. What do you think?

mvnural commented 9 years ago

I'm not sure. It might be both Yes and No.

It's yes because the only reason we have it this way is for convenience. We simply reused the variables to create multiple models. Normally, this would be part of feature selection step.

It's no because think of the case where we have field headers of a dataset aligned with a domain ontology. Then they would simply be shared across datasets of the same domain. This is not something we should worry about right now but just food for thought.

On Tue, Mar 3, 2015 at 8:22 PM, Michael Cotterell notifications@github.com wrote:

@mvnural https://github.com/mvnural, I've set this issue to accepted. However, before we implement as explained in my original post, I'd like to suggest the following to make things even easier. We should should have an ontology for each example dataset. For each dataset, we could have multiple model examples. My reasoning for having multiple ontologies for the examples is that it might get confusing if variable names are shared between datasets. What do you think?

— Reply to this email directly or view it on GitHub https://github.com/scalation/analytics/issues/8#issuecomment-77079111.

Mustafa Veysi Nural PhD Candidate Department of Computer Science University of Georgia, Athens http://www.mendeley.com/profiles/mustafa-nural/

mvnural commented 9 years ago

I thought more about this and decided that having an ontology for each dataset should be fine. Even if same variable is used across datasets (e.g., mpg), they are only conceptually related but different entities. So please ignore my previous concern.