information-artifact-ontology / IAO

information artifact ontology
Creative Commons Attribution 4.0 International
78 stars 25 forks source link

ETL class #187

Closed aellenhicks closed 7 years ago

aellenhicks commented 8 years ago

We are requesting a class for extract, transform, and load process that has data items (or ICEs more generally) as specified input and the data items or (ICEs more generally) as specified output.

aellenhicks commented 8 years ago

Following up on this class request. We need this for a project. Is it possible to get this in the next two weeks?

alanruttenberg commented 8 years ago

Sorry for the delay in responding. I would be comfortable having you or a designate be a committer on the project, so that processes such as ETLs could be added directly by you. Would this lead to a better situation for you?

mcourtot commented 8 years ago

Hi @aellenhicks : could you please provide label, definition and position in hierarchy? If you have a reference for the class (PMID or else) and an example of usage that would be ideal.

alanruttenberg commented 7 years ago

Proposed: A planned process which takes as input a database and fills another database by extracting concretizations of information entities from the first, transforming them, and loading the transformed concretizations into the second.

Has specified input information content entity Has specified output information content entity

Editor note: We don't define database in IAO, currently, as the bare term is ambiguous. Reasonable interpretations of the word might be the material entity, an information structure, an information content entity. However this definition commits, at least, to there being some material thing which bear concretizations of information entities and that there are new concretizations created during the process. We consider the ETL process in terms of information entities rather than the concretizations. No committment is made as to whether the specified output information content entitie are the same or different than the original - both scenarios are plausible.

aellenhicks commented 7 years ago

Concretiztions are specifically dependent continuants, so this raises the question of how they can be extracted from one database and loaded into another. The editor note specifies that the loaded concretization is a new (and hence different) thing, but the definition does not seem to convey that. Is it really the concretization that is extracted?

I like the fact that the output can be identical to the input.

alanruttenberg commented 7 years ago

The editor note is just that. We haven't yet talked about processes in which the concretizations are inputs or outputs, but clearly they are in the process. Extracting and loading would are like "copying" in the sense that copying is a process by which a new concretization is made. The struggle is to use the familiar words "extract","transform","load" and mesh it with IAO, particularly as IAO isn't fully developed. Do you have a proposal for an alternative wording, or do you think there's a better account?

aellenhicks commented 7 years ago

It seems to me that copying or extracting also involves the generically dependent continuant, or ICE. So perhaps transforming is an algorithmic way of producing a new ICE (and hence new sdc that concretizes it) from another ICE. Does this analysis sound right to you?

alanruttenberg commented 7 years ago

Only some transformations create new ICE's I think. For example, suppose you are correcting a misspelling during the transform (not uncommon), in that case I would think that both concretizations are of the the same ICE.

Would this do better?

A planned process which takes as input a database and copies concretizations from the first, optionally transforms then copies the result to the second

alanruttenberg commented 7 years ago

However it should be noted that lack of a clear identity criterion for information content entities is an outstanding problem for IAO. For the most part we have been loose about what counts as a concretization.

alanruttenberg commented 7 years ago

btw, I'm proposing the editor preferred term "database extract, transform, and load process" and alternative term "ETL", ok?

aellenhicks commented 7 years ago

You're points are well taken. I'm fine with preferred term. However, it still does not seem intuitively correct to say that the concretization is transformed. Could you explicate this a bit more?

From: Alan Ruttenberg notifications@github.com<mailto:notifications@github.com> Reply-To: information-artifact-ontology/IAO reply@reply.github.com<mailto:reply@reply.github.com> Date: Wednesday, December 21, 2016 at 4:43 PM To: information-artifact-ontology/IAO IAO@noreply.github.com<mailto:IAO@noreply.github.com> Cc: aellenhicks aellenhicks@gmail.com<mailto:aellenhicks@gmail.com>, Mention mention@noreply.github.com<mailto:mention@noreply.github.com> Subject: Re: [information-artifact-ontology/IAO] ETL class (#187)

btw, I'm proposing the editor preferred term "database extract, transform, and load process" and alternative term "ETL", ok?

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/information-artifact-ontology/IAO/issues/187#issuecomment-268648563, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHyM2TjHnOHUvv7GJyarVsjdLpaSBotqks5rKZ17gaJpZM4Jll_1.

alanruttenberg commented 7 years ago

What I mean by transformed is not that the concretization is changed, but rather that there is a process that takes one as input and creates one as output. The newly created concretization might be of the same information artifact, or might be the seed of a new one. Take an analogy to copying a piece of writing by hand. You look at what you are copying and then write something else. Depending on what you are doing the something else might be another concretization of the original ICE, or not. For example if you copy from cursive to block lettering you are creating a new concretization of the same ICE. On the other hand if you copy most of the writing but substitute fictional names for for real names in the original writing you are creating a concretization of a new ICE (perhaps originally concretized in your head).

Summary: Transformation in the sense I intend is not changing something. It is a process in which something new is created that is constructed, at least in part, by using the input as a template.

alanruttenberg commented 7 years ago

This was added in the last release.