Closed aellenhicks closed 7 years ago
Following up on this class request. We need this for a project. Is it possible to get this in the next two weeks?
Sorry for the delay in responding. I would be comfortable having you or a designate be a committer on the project, so that processes such as ETLs could be added directly by you. Would this lead to a better situation for you?
Hi @aellenhicks : could you please provide label, definition and position in hierarchy? If you have a reference for the class (PMID or else) and an example of usage that would be ideal.
Proposed: A planned process which takes as input a database and fills another database by extracting concretizations of information entities from the first, transforming them, and loading the transformed concretizations into the second.
Has specified input information content entity Has specified output information content entity
Editor note: We don't define database in IAO, currently, as the bare term is ambiguous. Reasonable interpretations of the word might be the material entity, an information structure, an information content entity. However this definition commits, at least, to there being some material thing which bear concretizations of information entities and that there are new concretizations created during the process. We consider the ETL process in terms of information entities rather than the concretizations. No committment is made as to whether the specified output information content entitie are the same or different than the original - both scenarios are plausible.
Concretiztions are specifically dependent continuants, so this raises the question of how they can be extracted from one database and loaded into another. The editor note specifies that the loaded concretization is a new (and hence different) thing, but the definition does not seem to convey that. Is it really the concretization that is extracted?
I like the fact that the output can be identical to the input.
The editor note is just that. We haven't yet talked about processes in which the concretizations are inputs or outputs, but clearly they are in the process. Extracting and loading would are like "copying" in the sense that copying is a process by which a new concretization is made. The struggle is to use the familiar words "extract","transform","load" and mesh it with IAO, particularly as IAO isn't fully developed. Do you have a proposal for an alternative wording, or do you think there's a better account?
It seems to me that copying or extracting also involves the generically dependent continuant, or ICE. So perhaps transforming is an algorithmic way of producing a new ICE (and hence new sdc that concretizes it) from another ICE. Does this analysis sound right to you?
Only some transformations create new ICE's I think. For example, suppose you are correcting a misspelling during the transform (not uncommon), in that case I would think that both concretizations are of the the same ICE.
Would this do better?
A planned process which takes as input a database and copies concretizations from the first, optionally transforms then copies the result to the second
However it should be noted that lack of a clear identity criterion for information content entities is an outstanding problem for IAO. For the most part we have been loose about what counts as a concretization.
btw, I'm proposing the editor preferred term "database extract, transform, and load process" and alternative term "ETL", ok?
You're points are well taken. I'm fine with preferred term. However, it still does not seem intuitively correct to say that the concretization is transformed. Could you explicate this a bit more?
From: Alan Ruttenberg notifications@github.com<mailto:notifications@github.com> Reply-To: information-artifact-ontology/IAO reply@reply.github.com<mailto:reply@reply.github.com> Date: Wednesday, December 21, 2016 at 4:43 PM To: information-artifact-ontology/IAO IAO@noreply.github.com<mailto:IAO@noreply.github.com> Cc: aellenhicks aellenhicks@gmail.com<mailto:aellenhicks@gmail.com>, Mention mention@noreply.github.com<mailto:mention@noreply.github.com> Subject: Re: [information-artifact-ontology/IAO] ETL class (#187)
btw, I'm proposing the editor preferred term "database extract, transform, and load process" and alternative term "ETL", ok?
- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/information-artifact-ontology/IAO/issues/187#issuecomment-268648563, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHyM2TjHnOHUvv7GJyarVsjdLpaSBotqks5rKZ17gaJpZM4Jll_1.
What I mean by transformed is not that the concretization is changed, but rather that there is a process that takes one as input and creates one as output. The newly created concretization might be of the same information artifact, or might be the seed of a new one. Take an analogy to copying a piece of writing by hand. You look at what you are copying and then write something else. Depending on what you are doing the something else might be another concretization of the original ICE, or not. For example if you copy from cursive to block lettering you are creating a new concretization of the same ICE. On the other hand if you copy most of the writing but substitute fictional names for for real names in the original writing you are creating a concretization of a new ICE (perhaps originally concretized in your head).
Summary: Transformation in the sense I intend is not changing something. It is a process in which something new is created that is constructed, at least in part, by using the input as a template.
This was added in the last release.
We are requesting a class for extract, transform, and load process that has data items (or ICEs more generally) as specified input and the data items or (ICEs more generally) as specified output.