Closed cong78 closed 4 years ago
Hi Cong -- couple questions / comments:
ProcessTemplate
, I think, was for it to represent a sort of process (ie. composition of multiple sub-processes) that simply was not in an executable state (ie. was not "bound")...ProcessTemplate
to be materially (properties-wise) different from Process
? If not, I would sort of think one might extend the other type-wise: perhaps Process
extends ProcessTemplate
(?)ProcessByTemplate
relationship should probably also only be 0..1
cardinality on the ProcessTemplate
end (that is: a Process
can presumably only be the executable form of a single ProcessTemplate
rather than multiple; though indeed you could have multiple executable Process
es that were derived from a single ProcessTemplate
so the other end should likely remain *
)Hi Christ thanks for replying. Here are some thoughts from me.
- Another scenario for
ProcessTemplate
, I think, was for it to represent a sort of process (ie. composition of multiple sub-processes) that simply was not in an executable state (ie. was not "bound")...
Yes I agree with point and I think it was also discussed as one of the reason for this new Type during the last workshop.
- Assuming this is still the case, would we expect
ProcessTemplate
to be materially (properties-wise) different fromProcess
? If not, I would sort of think one might extend the other type-wise: perhapsProcess
extendsProcessTemplate
(?)
I was also thinking about it and one question that I got is if we save the DataStage unfinished jobs as process template entities with relationships to ports in Egeria, what would happen if the job is finished (after the run button I guess)? Will a new process entity is going to replace an old process template entity?
- The
ProcessByTemplate
relationship should probably also only be0..1
cardinality on theProcessTemplate
end (that is: aProcess
can presumably only be the executable form of a singleProcessTemplate
rather than multiple; though indeed you could have multiple executableProcess
es that were derived from a singleProcessTemplate
so the other end should likely remain*
)
I drew relationship's cardinality as *..*
because I saw one of the drawing documentation that Raluca sent me has different levels of processes from most granular to most abstract. So I had an assumption that a very high level process could probably represented by multiple (sub)processes and they will probably use a same process template entity during the process design?
The New Type drawing from @cmgrote 's proposal?
I think it is important to separate the idea of a process template from somthing that is executable - so a process should not inherit from a process template. (IE remove the triangle arrow head :)
The relationship could be called ProcessTemplateImplementation.
The ProcessTemplate belongs in Area 5 - see 0575
Another thought - the ProcessTemplate should probably inherit from Referenceable rather than Asset. This is what we have done for reference data (valid value set) and model element
I think it is important to separate the idea of a process template from somthing that is executable - so a process should not inherit from a process template.
I agree that they should be distinct concepts, but what about the large overlap that's likely there in terms of the relationships that are valid across both? I would expect a ProcessTemplate
could be composed of sub-ProcessTemplate
s (just as Process
es can be composed of sub-Process
es), or even that it might be possible for a ProcessTemplate
to be defined by the composition of multiple underlying Process
es... I think a ProcessTemplate
should also still be able to relate to ProcessPort
s, too? Similarly, perhaps relations from other areas (like GovernanceProcessImplementation
) should logically be able to relate to either a Process
or a ProcessTemplate
(?)
(So I'm perhaps leaning towards a ProcessTemplate
being a specialised form of (extending from) Process
?)
Another thought - the ProcessTemplate should probably inherit from Referenceable rather than Asset. This is what we have done for reference data (valid value set) and model element
I have made a drawing based on moving ProcessTemplate to area 0575 ProcessSchema.
To clarify my understanding, we are talking about two scenarios about this new type. First one is Process Template as a separate type extending from Referenceable. Second one is process type is extending the new type of process template.
Sorry @cmgrote, I do not understand the second scenario. I remember that you talked about a partially completed process. This seems different from a ProcessTemplate. I would say that is a Process that has a status of DRAFT?
The scenarios I had in mind were along these lines:
I thought we had discussed that for (2) this would be a ProcessTemplate
, and (1) and (3) would be Process
es, but maybe I'm mistaken?
(If (2) were a ProcessTemplate
then I'd still need to be able to define the ProcessPort
s for its inputs / outputs, potentially a hierarchy of Process
es (and / or mixed with other ProcessTemplate
s) for more granular parts of that process, etc.)
Probably the discussion is about the definition of the ProcessTemplate and how it is being implemented in different types of data engines. Because I have never worked with DataStage so I cannot give too much insights on that side. But for Apache NiFi, there is a concept named Dataflow Template. And the definition is like this:
Apache NiFi provides users the ability to build very large and complex DataFlows using NiFi. This is achieved by using the basic components: Processor, Funnel, Input/Output Port, Process Group, and Remote Process Group. These can be thought of as the most basic building blocks for constructing a DataFlow. At times, though, using these small building blocks can become tedious if the same logic needs to be repeated several times. To solve this issue, NiFi provides the concept of a Template. A Template is a way of combining these basic building blocks into larger building blocks. Once a DataFlow has been created, parts of it can be formed into a Template. This Template can then be dragged onto the canvas, or can be exported as an XML file and shared with others. Templates received from others can then be imported into an instance of NiFi and dragged onto the canvas.
https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates
Here is an example that I can think of how can we define processTemplate:
In these pic there are four phases and the phase 2 Create Table can be exported as a Template. The Create Table Template then can be a reusable component for any people who need to do similar task and import it in their ETL or data processing workflows. It is design-oriented with configuration options as you can change the properties under each processors, such as for ConvertAvroToJson you can add the details of the Name and Version of the Avro file that you want to consume. () [https://blog.pythian.com/database-migration-using-apache-nifi/]
So from my understanding, there are two definitions:
ProcessTemplateSchema
(my imagined name) :
Can be reused by importing or referencing
ProcessTemplate
:
For ProcessTemplateSchema
we treat it as a referencable in Area 0575 ProcessSchema.
For ProcessTemplate
we treat it as a super type of Process
with configuration possibilities.
Not sure if these make sense but I feel Apache NiFi is a good example for me to think through it. @mandy-chessell @cmgrote Please let me know if you have any ideas.
I'm wondering if the first thing you're talking about (the ProcessTemplateSchema
) is what we should be using as the meaning for a ProcessTemplate
-- and the latter is still just a Process
.
I would distinguish the two as:
Process
has defined inputs and outputs (ProcessPort
s) and some activity within itProcess
is executable but does not necessarily need to have yet been executedProcessTemplate
is some re-usable piece of logic that can be embedded within a Process
, but a ProcessTemplate
is not executable. It has no inputs / outputs formally defined until it is included as part of a Process
(?)I'm now seeing ProcessTemplate
as being some (optional) portion of a Process
, but not something that could be a Process
on its own. (DataStage has a similar concept in what it calls a "shared container": re-usable block of logic, which could have various sub-blocks of logic, but none of it can be executed until it is put into a job (Process
).)
This would mean my 3 scenarios outlined in my previous comment would all be Process
es.
We need to distinguish between the type of something - which is true for ite lifetime - and a change in state - ie moving from partial to complete. I would think that when a process template is used, it is copied into a process and then the process can be customized?
As per Mandy's comment above - templates in nifi appear to be more used to get developers started, for export/import, sharing .. So they are indeed copied and modified rather than used as a 'module' - ie reusable logic without modification. Not that different to cut/paste or visual copy, just quicker..
So although we have a set of process templates which it's useful to understand & catalog, their relationship to process is only through the design process. it's a loose coupling 'was used to create', like in a single git commit. The actual process definition could become completely different - delete all nodes and recreate for example.
Maybe we therefore have two separate things here
hi @cmgrote @mandy-chessell @planetf1 thanks for your ideas.
By combining what we are saying, the process template
is a re-usable piece of a logic or few sub-logics from data engines that can help engineers to design the process
. It can be copied or imported during the process design and also can be customised depending on the actual purpose of the process. It can not be executed until it is being put into a process
job.
Would it be something that we can agree?
Is the comment above an accurate reflection of our discussion in Huizen? If not, suggest we need to urgently capture it as I think I've already forgotten 🙁
Is the comment above an accurate reflection of our discussion in Huizen? If not, suggest we need to urgently capture it as I think I've already forgotten 🙁
Sorry I should have written down the conclusions just during the workshop. There are the things that I wrote down for the discussions/conclusions that we had :
Process Template
can be a Risk Profile Structure from regulators that all the banks need to perform by filling in Process
details to meet the risk requirements. So the Process Template
is not executable and only can be implemented by process. ProcessTemplateImplementation
. Data Stage
it will not publish any Process Template
and only Process
with different instance status.@mandy-chessell Could you please have a final look?
Hi @mandy-chessell I created this issue initially thought we might use it for reusable design components in data engines. And after the workshop you gave us a slightly different explanation on it. Just wondering do we have enough knowledge to conclude this new type or we need more valid use case to prove it?
Otherwise I will set milestone tag to a bit later release.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.
During the Egeria August workshop we have discussed the need for new asset type of Process Template. This new type can be used for data engines to provide processing or transformation templates that can be used to design or develop data processing pipelines or ETL jobs.
Taking Apache NiFi (https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#building-dataflow ) as an open source project example, they provide many Processors as basic components to build up a data flow pipeline. Each processor has its own configuration properties as well as functionality scenarios. When a user is designing a data flow pipeline, they can configure the existing processor with input and output ports as a part of the flow.
From Open Metadata perspective, when we are registering the data flow pipeline from the Apache NiFi as a Data Engine client to Egeria, we can save the processors from it as Process Template entities so that they are visible from metadata visibility and management perspectives. When someone is designing the ETL job or data processing pipeline, he or she will use that Process Template to create single or multiple processes.
I have made a draft based on my understanding about this new type within area of 0217 Automated Process. The reason is I consider the Process template can be considered as a part of Process Automation without any human creations for specific technical purpose. And it is a sub type of Asset with a relationship to type Process that processes can be created based on the Process Templated offered by Data Engines.
@cmgrote @mandy-chessell @popa-raluca As this is just my initial draft based on the inputs that I got. Please let me know your thoughts and ideas about it. ;)
Thanks,
Cong