Closed rjplevin closed 3 weeks ago
I think my biggest concern is that the validation requires specific definitions inside
<Contains>
elements in the XML. One could create an XML that is valid from a structural perspective, but because a stream contains"storage oil"
instead of"oil for storage"
the validation would fail.
The XML must correspond to the Python code in 2 ways: the names / classes of Process
subclasses, and the generalized contents of Streams
. You could have "structurally correct" XML that fails in one or both of these ways. I don't see an easy way to decouple these. The validation subsystem is the next best thing: automate detection of these failures.
This also raised some confusion about the
name
attribute on Streams and theContains
elements. It seems like the names are used only for theStreamRef
, but I could be mistaken.
I believe that's correct. And for reporting / GUI display.
I can also see that there are repeated strings in the
Contains
elements, which I can see being used to aggregate across multiple Stream instances. But it also implies there are specific types of streams that must carry specific contents and that certain processes must be connected to specific Stream types.
As I've probably mentioned, I've long advocated for avoiding stream content names that reference specific processes, as this somewhat defeats the purpose of the <Contains>
element. Process-specific names aren't much better than traversing the Stream
to identify the Process
on the opposite side, which would be terribly brittle.
I think in the short-term this validation seems like a solid approach, but I think it's worth considering an update to Stream handling where stream contents are narrowed to a limited set of substances (i.e.
gas
,oil
,CH4
, etc...) and processes have more knowledge of input/output streams.
I agree that a more generalized (and perhaps a known, limited) set of stream contents makes sense. We could define the set of these in the code and allow extensions via the configuration file. This would improve validation, and probably simplify some of the run
methods.
In this way, a process could declare that it requires a certain number or range if input/output streams with certain components/contents.
VFPartition
is maybe a good example where the input specification appears to be a single stream, while theopgee.xml
template has multiple streams ending atVFPartition
both containinggas for partition
. If streams were a very generic connection between processes e.g.:<Stream src="BitumenMining" dst="VFPartition" /> <Stream src="Separation" dst="VFPartition" />
Then
BitumenMining
andSeparation
could set the contents/components on their output streams to then be validated byVFPartition
. Furthermore,VFPartition
could specify that it requires 1 to 3 input streams containinggas
whileBitumenMining
could specify it has 1 or 2oil
output streams and 0 or 1gas
output stream.
I see a few issues with this. For one, the run
method of the source process needs to be able to "find" it's output streams to know which one to emit substances to. If the contents are set by, and thus vary with, the destination process, this becomes unknowable.
Also, I'm not sure it's necessary to identify the number of input streams containing a substance. If a process can handle multiple, it can call combine_streams
, regardless of the number, so there's no benefit I see to requiring a specific number. In the case of VFPartition
, I believe it's only important that at least one such stream is found.
This admittedly would lose the ability for the field/model to quickly grab all the
gas for partition
streams in the same way, but I think that could be solved with something similar to theAggregator
or another kind of categorization mechanism.
Where is this "grab all the X streams" used? It's not clear to me that this is useful.
The main benefits I see are simplifying the XML and decoupling its details from the underlying model and allowing for class level declarations in processes where we could leverage existing validation/schema libraries (e.g. Pydantic) for runtime checks.
the run method of the source process needs to be able to "find" it's output streams to know which one to emit substances to. If the contents are set by, and thus vary with, the destination process, this becomes unknowable.
I may not have been clear on this point, but I was thinking that contents would be set by the source processes. The only thing that would vary is how many, so if a process could have 0-1 "oil" or 2-Inf "gas" output streams, it could handle the different configurations explicitly.
One drawback is if it needs to handle 2 streams with the same contents differently (e.g. CrudeOilDewatering
has 2 oil
outputs for dilution and upgrading). In the current configuration, this feels like a proxy for one process behaving differently based on its adjacent processes. If there are constraints on source and destination processes for a given stream, it seems reasonable to include that in an input/output stream specification and have special handling based on the process at the other end of the stream.
Where is this "grab all the X streams" used? It's not clear to me that this is useful.
I don't see this used currently, it was more in response to the documentation for the <Contains>
tag.
allows processes to find different input streams without reference to any specific process name.
It also allows any object to find streams based on contents, so I wasn't sure if that was an intended use case. Even so, it seems like there are constraints on which processes can connect (and in what direction, perhaps?). This is currently reflected in the <Contains>
strings, and I'd advocate for defining those constraints at the process level either by valid neighbor process classes or (if we really want maximum flexibility) by interface (e.g. if a given process is the src
the dst
process must have a certain input specification).
Yeah, but "any object" doesn't have a need to find streams. I think it's relevant only to other Process subclasses.
I'd advocate for defining those constraints at the process level either by valid neighbor process classes or (if we really want maximum flexibility) by interface (e.g. if a given process is the src the dst process must have a certain input specification).
I've tried to avoid any such constraints. Think about a user defining a novel Process subclass and wanting to wire it into the Field definition. The constraints above prevent this without code modifications. The whole point of
On Oct 29, 2024, at 12:15 PM, Michael Barlow @.***> wrote:
the run method of the source process needs to be able to "find" it's output streams to know which one to emit substances to. If the contents are set by, and thus vary with, the destination process, this becomes unknowable. I may not have been clear on this point, but I was thinking that contents would be set by the source processes. The only thing that would vary is how many, so if a process could have 0-1 "oil" or 2-Inf "gas" output streams, it could handle the different configurations explicitly. One drawback is if it needs to handle 2 streams with the same contents differently (e.g. CrudeOilDewatering has 2 oil outputs for dilution and upgrading). In the current configuration, this feels like a proxy for one process behaving differently based on its adjacent processes. If there are constraints on source and destination processes for a given stream, it seems reasonable to include that in an input/output stream specification and have special handling based on the process at the other end of the stream.
— Reply to this email directly, view it on GitHub https://github.com/msmasnadi/OPGEEv4/pull/27#issuecomment-2445125408, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA25CHSN4QXUMKNSKEDAQ5LZ57NEPAVCNFSM6AAAAABPTD7XF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBVGEZDKNBQHA. You are receiving this because you authored the thread.