Closed NicolasRouquette closed 1 year ago
Question: What should be the ID of a term or contract that was constructed without an explicit ID?
That is, should we have this kind of API, given PEP 8 Descriptive Naming Styles:
Term.pacti_id() : Optional[str]
IoContract.pacti_id() : Optional[str]
Or this:
Term.pacti_id() : str
IoContract.pacti_id() : str
Where the constructor would use Python's built-in id()
if no ID is provided.
It seems that the 2nd alternative is preferable for traceability proposes.
Eventually, for the purpose of traceability, I get the impression that a term may need to know to what contract it belongs--and whether it is an assumption or guarantee. This would require us to identify each term uniquely--for which we probably cannot rely on the user. We may need to have a term factory that is in charge of generating new IDs for terms.
In the case we are discussing, we are using the ID
field as a way to enable good user output.
We could opt to not call this ID
and instead use a provenance
field that tells how the term was obtained. For a top-level term, we would say, 'user provided.' When we make syntactic transformations, we could say that provenance is user input: rule 2a.
I believe this provenance field would be extended in the future to say things like 'this term was obtained via refinement/relaxation of terms T1 T2...'
We need to separate two different concepts:
1) Unique IDs for API entities: Term
, IoContract
, and IoContractCompound
; including all their specializations.
2) Provenance
For (1), I suggest using a UUID version 4 string with the user-provided ID as a suffix. This will ensure that all of Pacti's IDs will be unique regardless of whether provenance traceability is turned on or off.
For (2), we have a separate issue about this: https://github.com/FormalSystems/pacti/issues/248 Since recording provenance can add a considerable amount of metadata, we need to provide an on/off switch so that we can avoid the overhead when performance matters.
I agree we have two items in the conversation: IDs and provenance, but the two are closely related. I only see a strong need for supporting IDs when supporting provenance.
For the current use case, which is disambiguation of print rules, I am not sure an ID would be the cleanest way to implement it. Can you give me an example of how an ID could help this disambiguation?
IMO, we could support disambiguation using a new dictionary in the Term
class. We could call this provenance
. Initially, we could use this dictionary to support disambiguation and extend it later to other provenance situations. For example, suppose a user writes the term
|x| <= 2
We could store
term: x <= 2
provenance: {'user_input': Rule1_A}
term: -x <= 2
provenance: {'user_input': Rule1_B}
Now, when we see that term x <= 2
has provenance 'user_input': Rule1_A
, we could look for the term that has provenance 'user_input': Rule1_B
and thus output to the user |x| <= 2
. What do you think? Do we need more information about a term than, e.g., "obtained by applying rule 1A"?
Suppose we parse a contract that has two expressions: |x| <= 2
, |y| <= 3
Then the dictionary approach would have:
term: x <= 2
provenance: {'user_input': Rule1_A}
term: -x <= 2
provenance: {'user_input': Rule1_B}
term: y <= 3
provenance: {'user_input': Rule1_A}
term: -y <= 3
provenance: {'user_input': Rule1_B}
We're losing a lot of information about the relationship between the rules; when we see x <= 2
and find its provenance is to Rule1_A
, we need to do structural matching on the negated term whose provenance is Rule1_B
. This makes the serialization algorithm more complicated than necessary.
Suppose instead that the pacti_id
is: tuple[uuid,str]
. We can use the same uuid
for the 1st attribute and 1A
or 1B
for the 2nd attribute. The pacti_id
as a class attribute should be preserved when we copy a term. For pretty-printing, each term's pacti_id
would provide information on whether it originates unmodified from parsing, in which case we could look for its sibling term with the same uuid
on its 1st pacti_id
attribute but with the opposite side of the rule on its 2nd attribute. With this strategy, structural matching would apply only for terms with no parsing-related pacti_id
information.
Since we've benchmarked Pacti to be fast when we use job-level parallel processing as we did for the space_mission case study, it is essential to avoid adding global in-memory states like a term dictionary as it would prevent running concurrent jobs on different processors!
This branch commit adds pacti_id: Tuple[uuid.UUID, str]
to Term
, IoContract
, IoCompoundContract
, and their specializations.
The next step is to hook up the current polyhedral parser to generate pairs of pacti_id
to simplify pretty printing as long as we have kept the original contract.
Verified test
, check-types
, and check-quality
.
This branch commit adds duplicate()
methods to preserve the pacti_id
during construction of TermList
and IoContract
. Unfortunately, for the latter, the constructor calls simplify()
on guarantees, which does not preserve the original pacti_id
, even if no simplifcation is made. To see this, compare these two tests:
The 1st test passes; the 2nd test fails at the check for the pacti_id
:
https://github.com/FormalSystems/pacti/blob/b02ca8df51accb1e851bc88d578c482f1d5e70b4/tests/test_polyhedral_contract_syntax.py#L43
The reason we loose the parse rule provenance is due to the simplification of the guarantees during construction: https://github.com/FormalSystems/pacti/blob/b02ca8df51accb1e851bc88d578c482f1d5e70b4/src/pacti/iocontract/iocontract.py#L441
To fix this problem, it seems that we need to augment simplify()
and preserve the original pacti_id
if no simplification was made. @iincer Do you agree with this strategy?
This commit updates the following APIs:
PolyhedralTerm.polytope_to_term
PolyhedralTermList.termlist_to_polytope
PolyhedralTermList.polytope_to_termlist
PolyhedralTermList.reduce_polytope
All tests pass, including new tests to verify the preservation of guarantee terms, even if simplification removes some of them; see the new test cases in test_polyhedral_contract_syntax.py
.
check-types
ok
check-quality
has two strange errors:
pdm run duty check-quality
✗ Checking code quality (1)
> flake8 --config=config/flake8.ini src tests case_studies docs
src/pacti/terms/polyhedra/polyhedra.py:728:9: WPS236 Found too many variables used to unpack a tuple: 6 > 4
src/pacti/terms/polyhedra/polyhedra.py:863:9: WPS227 Found too long function output tuple: 6 > 5
Fixed the quality checks; this is ready for a draft PR.
Is your feature request related to a problem? Please describe.
To expand Pacti's syntax for Polyhedral terms, it will be necessary to keep track of the conversion between the end-user input form of a term and the internal Polyhedral terms as summarized here: https://github.com/FormalSystems/pacti/issues/129#issuecomment-1435218828
Describe the solution you'd like This ticket is about the basic functionality of tracking term and contract ID at the API level.