[Feature request] Add support for Term and Contract ID

NicolasRouquette commented 1 year ago

Is your feature request related to a problem? Please describe.

To expand Pacti's syntax for Polyhedral terms, it will be necessary to keep track of the conversion between the end-user input form of a term and the internal Polyhedral terms as summarized here: https://github.com/FormalSystems/pacti/issues/129#issuecomment-1435218828

Describe the solution you'd like This ticket is about the basic functionality of tracking term and contract ID at the API level.

NicolasRouquette commented 1 year ago

Question: What should be the ID of a term or contract that was constructed without an explicit ID?

That is, should we have this kind of API, given PEP 8 Descriptive Naming Styles:

Term.pacti_id() : Optional[str]
IoContract.pacti_id() : Optional[str]

Or this:

Term.pacti_id() : str
IoContract.pacti_id() : str

Where the constructor would use Python's built-in id() if no ID is provided.

It seems that the 2nd alternative is preferable for traceability proposes.

iincer commented 1 year ago

Eventually, for the purpose of traceability, I get the impression that a term may need to know to what contract it belongs--and whether it is an assumption or guarantee. This would require us to identify each term uniquely--for which we probably cannot rely on the user. We may need to have a term factory that is in charge of generating new IDs for terms.

In the case we are discussing, we are using the ID field as a way to enable good user output. We could opt to not call this ID and instead use a provenance field that tells how the term was obtained. For a top-level term, we would say, 'user provided.' When we make syntactic transformations, we could say that provenance is user input: rule 2a. I believe this provenance field would be extended in the future to say things like 'this term was obtained via refinement/relaxation of terms T1 T2...'

NicolasRouquette commented 1 year ago

We need to separate two different concepts: 1) Unique IDs for API entities: Term, IoContract, and IoContractCompound; including all their specializations. 2) Provenance

For (1), I suggest using a UUID version 4 string with the user-provided ID as a suffix. This will ensure that all of Pacti's IDs will be unique regardless of whether provenance traceability is turned on or off.

For (2), we have a separate issue about this: https://github.com/FormalSystems/pacti/issues/248 Since recording provenance can add a considerable amount of metadata, we need to provide an on/off switch so that we can avoid the overhead when performance matters.

iincer commented 1 year ago

I agree we have two items in the conversation: IDs and provenance, but the two are closely related. I only see a strong need for supporting IDs when supporting provenance.

For the current use case, which is disambiguation of print rules, I am not sure an ID would be the cleanest way to implement it. Can you give me an example of how an ID could help this disambiguation?

IMO, we could support disambiguation using a new dictionary in the Term class. We could call this provenance. Initially, we could use this dictionary to support disambiguation and extend it later to other provenance situations. For example, suppose a user writes the term

|x| <= 2

We could store

term: x <= 2
provenance: {'user_input': Rule1_A}

term: -x <= 2
provenance: {'user_input': Rule1_B}

Now, when we see that term x <= 2 has provenance 'user_input': Rule1_A, we could look for the term that has provenance 'user_input': Rule1_B and thus output to the user |x| <= 2. What do you think? Do we need more information about a term than, e.g., "obtained by applying rule 1A"?

NicolasRouquette commented 1 year ago

Suppose we parse a contract that has two expressions: |x| <= 2, |y| <= 3

Then the dictionary approach would have:

term: x <= 2
provenance: {'user_input': Rule1_A}

term: -x <= 2
provenance: {'user_input': Rule1_B}

term: y <= 3
provenance: {'user_input': Rule1_A}

term: -y <= 3
provenance: {'user_input': Rule1_B}

We're losing a lot of information about the relationship between the rules; when we see x <= 2 and find its provenance is to Rule1_A, we need to do structural matching on the negated term whose provenance is Rule1_B. This makes the serialization algorithm more complicated than necessary.

Suppose instead that the pacti_id is: tuple[uuid,str]. We can use the same uuid for the 1st attribute and 1A or 1B for the 2nd attribute. The pacti_id as a class attribute should be preserved when we copy a term. For pretty-printing, each term's pacti_id would provide information on whether it originates unmodified from parsing, in which case we could look for its sibling term with the same uuid on its 1st pacti_id attribute but with the opposite side of the rule on its 2nd attribute. With this strategy, structural matching would apply only for terms with no parsing-related pacti_id information.

NicolasRouquette commented 1 year ago

Since we've benchmarked Pacti to be fast when we use job-level parallel processing as we did for the space_mission case study, it is essential to avoid adding global in-memory states like a term dictionary as it would prevent running concurrent jobs on different processors!

NicolasRouquette commented 1 year ago

This branch commit adds pacti_id: Tuple[uuid.UUID, str] to Term, IoContract, IoCompoundContract, and their specializations.

The next step is to hook up the current polyhedral parser to generate pairs of pacti_id to simplify pretty printing as long as we have kept the original contract.

Verified test, check-types, and check-quality.

NicolasRouquette commented 1 year ago

This branch commit adds duplicate() methods to preserve the pacti_id during construction of TermList and IoContract. Unfortunately, for the latter, the constructor calls simplify() on guarantees, which does not preserve the original pacti_id, even if no simplifcation is made. To see this, compare these two tests:

Contract with 1 assumption only

https://github.com/FormalSystems/pacti/blob/b02ca8df51accb1e851bc88d578c482f1d5e70b4/tests/test_polyhedral_contract_syntax.py#L12-L26

Contract with 1 guarantee only

https://github.com/FormalSystems/pacti/blob/b02ca8df51accb1e851bc88d578c482f1d5e70b4/tests/test_polyhedral_contract_syntax.py#L29-L43

The 1st test passes; the 2nd test fails at the check for the pacti_id: https://github.com/FormalSystems/pacti/blob/b02ca8df51accb1e851bc88d578c482f1d5e70b4/tests/test_polyhedral_contract_syntax.py#L43

The reason we loose the parse rule provenance is due to the simplification of the guarantees during construction: https://github.com/FormalSystems/pacti/blob/b02ca8df51accb1e851bc88d578c482f1d5e70b4/src/pacti/iocontract/iocontract.py#L441

To fix this problem, it seems that we need to augment simplify() and preserve the original pacti_id if no simplification was made. @iincer Do you agree with this strategy?

NicolasRouquette commented 1 year ago

This commit updates the following APIs:

PolyhedralTerm.polytope_to_term
PolyhedralTermList.termlist_to_polytope
PolyhedralTermList.polytope_to_termlist
PolyhedralTermList.reduce_polytope

All tests pass, including new tests to verify the preservation of guarantee terms, even if simplification removes some of them; see the new test cases in test_polyhedral_contract_syntax.py.

check-types ok

check-quality has two strange errors:

pdm run duty check-quality
✗ Checking code quality (1)
  > flake8 --config=config/flake8.ini src tests case_studies docs
  src/pacti/terms/polyhedra/polyhedra.py:728:9: WPS236 Found too many variables used to unpack a tuple: 6 > 4
  src/pacti/terms/polyhedra/polyhedra.py:863:9: WPS227 Found too long function output tuple: 6 > 5

NicolasRouquette commented 1 year ago

Fixed the quality checks; this is ready for a draft PR.

NicolasRouquette commented 1 year ago

Closing, see https://github.com/FormalSystems/pacti/pull/252#issuecomment-1471048368

pacti-org / pacti

[Feature request] Add support for Term and Contract ID #247