pacti-org / pacti

A package for compositional system analysis and design
https://www.pacti.org
BSD 3-Clause "New" or "Revised" License
19 stars 5 forks source link

Polyhedral Contract Syntax #129

Closed NicolasRouquette closed 1 year ago

NicolasRouquette commented 1 year ago

With:

from pacti.utils.string_contract import StrContract
from pacti.terms.polyhedra.loaders import string_to_polyhedra_contract

We can write contracts in a human-friendly way like this:

# PWR:
# Parameters:
# - s: start index of the timeline variables
# - tstart: scheduled task instance start time
# - duration: scheduled task instance duration
# - generation: rate of battery charge during the task instance
def PWR_contract(s: int, tstart: float, duration: float, generation: float) -> tuple[int, list[IoContract]]:
  e = s+1
  spec = StrContract(
    inputs = [
      f"t{s}",    # Scheduled start time
      f"soc{s}",  # initial battery SOC
      f"d{s}",    # initial data volume
      f"e{s}",    # initial trajectory error
      f"r{s}",    # initial relative distance
    ],
    outputs = [
      f"t{e}",    # Scheduled end time
      f"soc{e}",  # final battery SOC
      f"d{e}",    # final data volume
      f"e{e}",    # final trajectory error
      f"r{e}",    # final relative distance
    ],
    assumptions = [
      # Scheduled task instance start time
      # t{s} == tstart
      f"t{s} <= {tstart}",
      f"-t{s} <= -{tstart}",
    ],
    guarantees = [
      # Scheduled task instance end time
      # t{e} - t{s} = duration
      f"t{e} - t{s} <= {duration}",
      f"t{s} - t{e} <= -{duration}",

      # Battery SOC charge
      # soc{e} - soc{s} >= duration*generation 
      f"soc{e} - soc{s} >= {duration*generation}",

      # no change to data volume
      # d{e} = d{s}
      f"d{e} - d{s} <= 0",
      f"d{s} - d{e} <= 0",

      # no change to trajectory error
      # e{e} = e{s}
      f"e{e} - e{s} <= 0",
      f"e{s} - e{e} <= 0",

      # no change to relative distance
      # r{e} = r{s}
      f"r{e} - r{s} <= 0",
      f"r{s} - r{e} <= 0",
    ])
  return e, string_to_polyhedra_contract(spec)

This approach delegates parsing the string expressions to Sympy, which accepts expressions that are not Polyhedral. The resulting error messages can be difficult for users to understand.

This issue is about defining the syntax of Polyhedra constraints for parsing and rendering sketched as follows:

1) LHS <= RHS

where:

For convenience, the following forms would be supported by rewrite as conjunctions of (1) above:

2) | LHS | <= RHS

Rewrite as the conjunction of:

3) | LHS | = 0

Rewrite as the conjunction of:

4) LHS = RHS

Rewrite as the conjunction of:

In the above:

The parser would reject any string expression that does not match any of (1,2,3,4).

This ticket entails two parts:

NicolasRouquette commented 1 year ago

FYI: I made a branch with WIP for this issue: https://github.com/FormalSystems/pacti/tree/polyhedra-contracts

Currently, it handles the canonical polyhedral term syntax: LHS <= RHS

I tried running the test suite but I got this error:

(pacti-3.9) nfr@nfr-desktop:/opt/local/github.formalsystems/pacti$ make test
pdm run duty test
Inside an active virtualenv /opt/local/github.formalsystems/pacti/.venv, reusing it.
āœ— Running tests (4)
  > pytest -c config/pytest.ini -n auto -k "" tests
  ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
  pytest: error: unrecognized arguments: --cov --cov-config -n auto tests
    inifile: /opt/local/github.formalsystems/pacti/config/pytest.ini
    rootdir: /opt/local/github.formalsystems/pacti/config

make: *** [Makefile:80: test] Error 4

Any suggestions?

iincer commented 1 year ago

Hi Nicolas, I just ran make test in this wip branch and got a successful output:

$ make test
pdm run duty test 
āœ“ Running tests

The error you get seems to indicate that coverage support for pytest is not installed. I don't know how that could be the case since pyproject includes directives to install coverage support:

tests = [
    "pytest>=6.2",
    "pytest-cov>=3.0",
    "pytest-randomly>=3.10",
    "pytest-xdist>=2.4",
]

Would rerunning pdm install make sense? My local installation has the folder pacti\.venv\Lib\site-packages\pytest_cov, which is, I guess, where the coverage support goes. I find this error strange because the rest of the installation seems to be okay for you.

NicolasRouquette commented 1 year ago

Thanks Inigo!

pdm install && make test

Success!

So far, I believe that this achieves the 1st part of the ticket: parser support.

The next step is the pretty-printer...

iincer commented 1 year ago

Great to hear that šŸ˜ƒ And thanks so much for improving the parser!

NicolasRouquette commented 1 year ago

In testing against the space_mission case study, I found that there were some bugs in the regex patterns that I'm fixing.

I also noticed that the previous pt_from_string would effectively map x > 0 to x = 0 to x <= 0 because it ignored the comparison operator!

iincer commented 1 year ago

I am not surprised... The original intent was to only enter expressions of the form 'ax + by <= c' (which is what the low-level representation supports), so the '<=' was assumed to be fixed and was probably ignored.

NicolasRouquette commented 1 year ago

I pushed a commit that implements the pretty printer described above.

It would be nice if pretty-printing the parsing of an input contract string would result in the same output string as the input string.

Unfortunately, this is not the case for our rules.

Rule Pattern Pos constraint Neg constraint
1 LHS <= RHS Not applicable Not applicable
2 \| LHS \| <= RHS LHS <= RHS -(LHS) <= RHS
3 \| LHS \| = 0 LHS <= 0 -(LHS) <= 0
4 LHS = RHS LHS <= RHS -(LHS) <= -(RHS)

Rule 1 is the canonical pattern; there is no rewrite. The patterns of rules 2,3,4 lead to a rewrite as a pair of positive and negative canonical patterns.

Consider the input string: x = 0.

For parsing, we apply rule 4 and rewrite it as: [ x <= 0, -x <= 0 ]

For pretty-printing, we then have an ambiguity, as this pair could match the pos/neg results of rules 2,3,4 depending on the order in which we look for them.

Is this really a big deal? In principle no because x = 0 is equivalent to |x| <= 0 and to |x| = 0.

What do you think?

NicolasRouquette commented 1 year ago

FYI: Here is an example of the pretty-printer:

https://github.com/FormalSystems/pacti/blob/polyhedra-contracts/case_studies/polyhedra_contracts/polyhedral-contract-example.py

It produces this:

IoContract rendering:
InVars: [<Var t1>, <Var soc1>, <Var d1>, <Var e1>, <Var r1>]
OutVars:[<Var t2>, <Var soc2>, <Var d2>, <Var e2>, <Var r2>]
A: 1.0*t1 <= 0.0, -1.0*t1 <= -0.0, -1.0*soc1 <= -375.0, -1.0*d1 <= -1.0
G: -1.0*t1 + 1.0*t2 <= 10.0, 1.0*t1 + -1.0*t2 <= -10.0, -1.0*soc1 + 1.0*soc2 <= 300.0, 1.0*d2 <= 0.0, -1.0*d2 <= 0.0, -1.0*e1 + 1.0*e2 <= 0.0, 1.0*e1 + -1.0*e2 <= 0.0, -1.0*r1 + 1.0*r2 <= 0.0, 1.0*r1 + -1.0*r2 <= 0.0

Polhedra Contract pretty-printing
Inputs: [t1, soc1, d1, e1, r1]
Outputs: [t2, soc2, d2, e2, r2]
A: [
        | t1 | = 0,
        -soc1 <= -375.0,
        -d1 <= -1.0
   ]
G: [
        -t1 + t2 = 10.0,
        -soc1 + soc2 <= 300.0,
        | d2 | = 0,
        | -e1 + e2 | = 0,
        | -r1 + r2 | = 0
   ]
NicolasRouquette commented 1 year ago

I am wondering whether we need to augment IoContract with a pretty-printer function: IoContract --> str.

There are several places where IoContract raises a ValueError that requires formatting an IoContract; e.g.:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[17], line 1
----> 1 c3 = c2.compose(tcm1)
      2 print("Contract c2.compose(tcm1):")
      3 print(polyhedra_contract_to_string(c3))

File /opt/local/github.formalsystems/pacti/src/pacti/iocontract/iocontract.py:458, in IoContract.compose(self, other)
    456     new_a = other.a.abduce_with_context(self.a | self.g, assumptions_forbidden_vars)
    457     if list_intersection(new_a.vars, assumptions_forbidden_vars):
--> 458         raise ValueError(
    459             "The guarantees \n{}\n".format(self.g)
    460             + "were insufficient to abduce the assumptions \n{}\n".format(other.a)
    461             + "by eliminating the variables \n{}".format(assumptions_forbidden_vars)
    462         )
    463     assumptions = new_a | self.a
    464 elif other_helps_self and not self_helps_other:

ValueError: The guarantees 
-1.0*t1 + 1.00000000000000*t4 <= 9.00000000000000, 1.0*t1 + -1.00000000000000*t4 <= -9.00000000000000, -1.00000000000000*d4 <= -50.0000000000000, 1.0*e1 + -1.00000000000000*e4 <= -40.0000000000000, -1.0*r1 + 1.00000000000000*r4 <= 0, 1.0*r1 + -1.00000000000000*r4 <= 0
were insufficient to abduce the assumptions 
1.0*t4 <= 9.0, -1.0*t4 <= -9.0, -1.0*soc4 <= -40.0, 1.0*e4 <= 4.0
by eliminating the variables 
[<Var t4>, <Var soc4>, <Var d4>, <Var e4>, <Var r4>, <Var t5>, <Var soc5>, <Var d5>, <Var e5>, <Var r5>]

This is hard to read...

So I'm proposing to change the IoContract constructor from this:

    def __init__(
        self, assumptions: TermList, guarantees: TermList, input_vars: List[Var], output_vars: List[Var]
    ) -> None:

to add optional pretty-printer functions based on this: https://docs.python.org/3/library/typing.html#callable

For IoContract, we need 3 kinds of optional printer functions:

    def __init__(
        self, assumptions: TermList, guarantees: TermList, input_vars: List[Var], output_vars: List[Var],
        pretty_printer: Callable[[IoContract], str] = lambda x: x.__str__(),
        termlist_printer: Callable[[TermList], str] = lambda x: x.__str__(),
        varlist_printer: Callable[[list[Var]], str] = lambda x: x.__str__()
    ) -> None:

Now, I get something more readable:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[15], line 1
----> 1 c3 = c2.compose(tcm1)
      2 print("Contract c2.compose(tcm1):")
      3 print(polyhedra_contract_to_string(c3))

File /opt/local/github.formalsystems/pacti/src/pacti/iocontract/iocontract.py:465, in IoContract.compose(self, other)
    463     new_a = other.a.abduce_with_context(self.a | self.g, assumptions_forbidden_vars)
    464     if list_intersection(new_a.vars, assumptions_forbidden_vars):
--> 465         raise ValueError(
    466             "The guarantees \n{}\n".format(self.termlist_printer(self.g))
    467             + "were insufficient to abduce the assumptions \n{}\n".format(self.termlist_printer(other.a))
    468             + "by eliminating the variables \n{}".format(self.varlist_printer(assumptions_forbidden_vars))
    469         )
    470     assumptions = new_a | self.a
    471 elif other_helps_self and not self_helps_other:

ValueError: The guarantees 
[
-t1 + t4 = 9.00000000000000,
 soc1 - soc4 <= -96.0000000000000,
 -d4 <= -50.0000000000000,
 e1 - e4 <= -40.0000000000000,
 | -r1 + r4 | = 0
]
were insufficient to abduce the assumptions 
[
t4 = 9.0,
 -soc4 <= -40.0,
 e4 <= 4.0
]
by eliminating the variables 
[d4, d5, e4, e5, r4, r5, soc4, soc5, t4, t5]
iincer commented 1 year ago

Thanks, Nicolas, for implementing this! It makes the tool more user-friendly. It just struck me that reporting results (including plotting) is an important application of a tool that aims to help users to deal with specifications. Regarding your questions:

For pretty-printing, we then have an ambiguity, as this pair could match the pos/neg results of rules 2,3,4 depending on the order in which we look for them. Is this really a big deal? In principle no because x = 0 is equivalent to |x| <= 0 and to |x| = 0. What do you think?

I agree that there is ambiguity. I would say that our higher priority rule to check in this case is equality (rule 4). It is better to output x = 0 than |x| = 0.

It produces this: [...] | t1 | = 0

The output looks great. It would be nice to remove spaces after an opening | and before a closing |, e.g., |t1| = 0 instead of | t1 | = 0

I'm proposing to change the IoContract constructor from...

Right now, we print contracts by defining a __str__(self) function in the IoContract class. This function calls the str function on the input and output variables and assumptions and guarantees termlists:

https://github.com/FormalSystems/pacti/blob/75d0c05d6ca40e8c2c9c68d95a23a503d8f193ad/src/pacti/iocontract/iocontract.py#L332-L343

I agree that the output you get is the way to go (it seems the constants need to be floating-point formatted--there are some long strings: 9.00000000000000). Instead of modifying the constructor of IoContract, however, we could just modify the __str__ routines of the Term class. These routines would invoke your pretty print. What do you think? Do you see advantages to doing this by extending the signature of the constructor with the optional printer functions?

Something to think about is that someone at some point may be using small numbers in constraints. Then scientific notation would be appropriate. We could decide for the user and output all numbers in scientific notation with a couple of fractional digits by default, or we could give the user the ability of passing a format string to Pacti.

NicolasRouquette commented 1 year ago

I think it would be preferable to have the __str__ functions produce human-friendly output.

I started w/ the optional pretty-printing functions because it was simple and I could see how this works in my case study. After running into a few errors from PolyhedralTermList functions, I realized that this approach is too awkward: better make sure that every __str__ does the right thing.

I agree about the higher priority of rule 4 for printing and about removing the spaces for the | signs.

Can you elaborate about formatting floating-point numbers?

For the pattern-based rewriting, I used numpy.isclose(..) to approximate the notion of floating point equality up to "unit in the last place" (ULP). It is unclear to me whether we should use hardcoded constants or make that tunable in the API.

iincer commented 1 year ago

Can you elaborate about formatting floating-point numbers?

What I had in mind is that the internal representation of a contract may be {"coefficients":{"i":1.0000001}, "constant":0.0000002}. If we go for a fixed fractional length for a floating-point number, say length = 2, we would get 1.00 i \le 0.00 as the output. In this case, we do violence to the constant term. Alternatively, we could always write scientific notation with a couple of fractional digits: 1.00E0 i <= 1.00 E -7. Doing this would allow us to always return something sensible, but in the cases when scientific notation is not needed, the output would not look ideal, e.g., 1.00E0 i <= 1.00E0 instead of i \le 1.

For the pattern-based rewriting, I used numpy.isclose(..) to approximate the notion of floating point equality up to "unit in the last place" (ULP). It is unclear to me whether we should use hardcoded constants or make that tunable in the API.

This is tricky, and may depend on the application...

iincer commented 1 year ago

From previous discussion, we have to

Something else that would be useful here is to

NicolasRouquette commented 1 year ago

Regarding numpy.isclose(...), the reason for this approximate operations stems from matching LHS with -(LHS) and RHS with -(RHS). In either case, we need to compare whether two numeric values are equal, which is a well-known source of problems. In Pacti's case, algebraic operations on PolyhedralTerm can introduce slight differences numbers due to floating point approximation and rounding phenomena. This, we cannot use strict equality, we instead need to use some kind of approximate equality criterion. There are two approximate equality criteria: relative error and 'units in the last place' (aka ULP). See section 1.2 here: https://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.102.244&rep=rep1&type=pdf

In Python, there are at least two different libraries providing this kind of support: https://docs.python.org/3/library/math.html#math.isclose https://numpy.org/doc/stable/reference/generated/numpy.isclose.html

Which one should we use?

NicolasRouquette commented 1 year ago

Regarding isClose, I put some constants in polyhedra:

RelativeTolerance: float = 1e-05
AbsoluteTolerance: float = 1e-08

def areNumbersApproximativelyEqual(v1: numeric, v2: numeric) -> bool:
   if isinstance(v1, int) & isinstance(v2, int):
      return v1 == v2
   else:
      f1=float(v1)
      f2=float(v2)
      return np.isclose(f1, f2, rtol=RelativeTolerance, atol=AbsoluteTolerance, equal_nan=True)

Since Polyhedra uses numpy instead of math, then perhaps it makes sense to use np.isclose.

NicolasRouquette commented 1 year ago

Overloaded PolyhedralTermList constructor to handle parsing a list of strings according to the 4 patterns.

This allows making as significant simplification of string_to_polyhedra_contract. Wondering whether to eliminate StrContract as Inigo suggests...

iincer commented 1 year ago

This issue is related to #144, which talks about improving the user experience of importing Pacti's functionality. Our discussion has centered around (a) pretty printing and (b) nice constructors. We should talk more about where to place this functionality. I believe it would make sense to leave polyhedra.py focused on low-level operations and have the pretty print and user-friendly constructors in external files. The exception is the function __str__, which we need to modify to support nicer error messages.

We should probably chat about the use of numpy.isclose(...) and alternatives...

NicolasRouquette commented 1 year ago

Following a discussion with Inigo, there are some API-breaking changes due to the polyhedral contract syntax.

The unit tests pass and show examples of the new API usage.

In the spirit of #144, it should suffice to have the following import:

from pacti.terms.polyhedra import *

The biggest breaking change stems from this kind of imports/code that will not work:

from pacti.terms.polyhedra.loaders import read_contract, write_contract

Previously, read_contract could read either a single dict or a list of dict. Now, one can use the PolyhedralContract API instead in one of 3 ways:

The above is just a sketch of a doc that we should write somewhere...

Finally, I added unit tests for the 4 variants of PolyhedralContract string syntax. See test_polyhedral_contract_syntax.py.

NicolasRouquette commented 1 year ago

Made a fix to the serialization of numbers -- it turns out that there were two kinds of numbers:

These cases are handled by a new serializer.number2string method:

def number2string(n: numeric) -> str:
    if isinstance(n, sympy.core.numbers.Float):
        f: sympy.core.numbers.Float = n
        return str(f.num)
    else:
        return str(n)
iincer commented 1 year ago

Per Google's naming conventions, the method readFromString should probably be called from_string. It makes sense that PolyhedralContract.from_string should yield a contract.

I think the ideal import statement would be

from pacti import PolyhedralContract

Then the user could build contracts by executing PolyhedralContract.from_string(args).

NicolasRouquette commented 1 year ago

I've applied some of the naming conventions:

NicolasRouquette commented 1 year ago

It is unclear to me what kind of refactoring we would need to do to achieve the ideal import objective:

from pacti import PolyhedralContract

Currently, we can do this:

from pacti.terms.polyhedra import *

It works because of the ./src/pacti/terms/polyhedra/__init__.py file:

image

Can you elaborate on what it would take to achieve this ideal import?

NicolasRouquette commented 1 year ago

Fixed the serialization bug that Inigo and I found on case_studies/polyhedra_contracts/polyhedral-contract-example3.py.

ayush9pandey commented 1 year ago

Just catching up with this thread. The changes to the string API look great. To address #144 and have import statements like from pacti import PolyhedralContract, we may need to create parent classes as user-friendly API under src\pacti\ and then modify the src\pacti\__init__.py to include this new API..

iincer commented 1 year ago

Hi @ayush9pandey, completely agree. We made some changes to the API, and now we say

from pacti.terms.polyhedra import PolyhedralContract

What used to be read_contract is now PolyhedralContract.from_dict(). Maybe we will do a further reorganization of the import statement to compress a bit further. We also have PolyhedralContract.from_string()

iincer commented 1 year ago

In the notation below, LHS, MHS, and RHS refer to a linear term (i.e., sum of coefficient multipliers for variables)

The table below maps an input pattern syntax into pairs of positive and negative constraints (i.e., the internal polyhedral representation).

For serialization, if prefixes are present, then matching positive/negative constraints are mapped to the pattern form; otherwise, the constraints are scanned to find corresponding positive/negative constraint pairs that can be mapped to the pattern form.

Optional Prefix [ID:] [ID(RuleLeft):] [ID(RuleRight):]
Rule Pattern Pos constraint Neg constraint
1a LHS <= number Not applicable Not applicable
1b LHS <= RHS LHS -(RHS) <= 0 Not applicable
2a \| LHS \| <= number LHS <= number -(LHS) <= number
2b \| LHS \| = RHS LHS -(RHS) <= 0 -(LHS) -(RHS) <= 0
3a LHS = number LHS <= number -(LHS) <= -(number)
3b LHS = RHS LHS -(RHS) <= 0 -(LHS) + RHS <= 0
4a LHS >= number -LHS <= -(number) Not applicable
4b LHS >= RHS -LHS -(RHS) <= 0 Not applicable
5a number1 <= MHS <= number2 number1 -(MHS) <= 0 MHS -(number2) <= 0
5b LHS >= MHS >= RHS -(LHS) + MHS <= 0 RHS -(MHS) <= 0
6a \|LHS1\| + ... + \|LHSn\| <= number a1LHS1 + ... + anLHSn <= number Not applicable
6b \|LHS1\| + ... + \|LHSn\| <= RHS a1LHS1 + ... + anLHSn -(RHS) <= 0 Not applicable

For 7a and 7b, the positive constraint expansion produces $2^n$ constraints where a1...an correspond to all possible combinations of -1 and 1.

Rules 1b, 2b, 3b, 4b, 5, and 6b could have optional numbers on either or both sides of the comparison.

It also makes sense to use similar rules for pretty print, i.e., output x >= 0 instead of -x <= 0. What do you think?

Yes; see rule 5a.

Finally, I don't see why we should not support a general inequality: 3x + 4y + 3 <= 5z + 2x - 2w + 2. We should be able to rewrite this expression (using sympy, perhaps) and convert into something Pacti accepts.

Yes, see rule 1b. Moreover, in the above, whenever there is a combination of 2 linear terms (e.g., LHS -(RHS)), this will require simplifying the linear terms. In this example: 3x - 2x + 4y - 5z + 2w will simply to: x + 4y - 5z + 2w.

iincer commented 1 year ago

161 is relevant to this discussion. The idea is to also support

|LHS1| + |LHS2| + ... + |LHSN| <= RHS

This would translate into $2^n$ constraints

a_1 * LHS1 + a_2 * LHS2 + ... + a_n * LHSN <= RHS

where a_i are all $2^n$ combinations of -1 or 1.

ayush9pandey commented 1 year ago

Hi @ayush9pandey, completely agree. We made some changes to the API, and now we say

from pacti.terms.polyhedra import PolyhedralContract

What used to be read_contract is now PolyhedralContract.from_dict(). Maybe we will do a further reorganization of the import statement to compress a bit further. We also have PolyhedralContract.from_string()

I am seeing an error when doing this import:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[6], line 1
----> 1 from pacti.terms.polyhedra import PolyhedralContract

File ~\AppData\Local\Continuum\anaconda3\envs\py310\lib\site-packages\pacti\terms\polyhedra\__init__.py:1
----> 1 from .serializer import write_contract
      2 from .polyhedra import PolyhedralTerm, PolyhedralTermList
      3 from .polyhedral_contract import PolyhedralContract

File ~\AppData\Local\Continuum\anaconda3\envs\py310\lib\site-packages\pacti\terms\polyhedra\serializer.py:9
      6 import re
      8 from typing import Optional, Tuple, Union
----> 9 from typing_extensions import TypedDict
     10 import numpy as np
     11 import sympy

ModuleNotFoundError: No module named 'typing_extensions'

Installing typing_extensions fixed this. So we may want to add this package to our dependency?

iincer commented 1 year ago

Yes, could you please run pdm add typing_extensions? This will add the dependency to the project, and we should all have it when you merge to main. I'm surprised I didn't get the error. Maybe this package is standard in recent python versions? I am running 3.10 and 3.11.

ayush9pandey commented 1 year ago

Yeah, I am surprised as well. I am running 3.10. So to confirm I created a new Github Codespace and it had this package already! But when I created a new conda environment on my local with a Python 3.10.9, it did not have this package.

Definitely something that comes pre-installed on some environments but not on conda.

Fixed with #188

iincer commented 1 year ago

Thanks for the additional details. It sounds like we should add it as a dependency.

NicolasRouquette commented 1 year ago

This issue requires a parser capability well beyond what is reasonable to tackle with the current regular expression approach.

Defining the syntax of Pacti's Polyhedral terms as a formal grammar makes sense, which means defining a PEG parser and looking for available options here: https://wiki.python.org/moin/LanguageParsing

The top two PEG libraries are pyparsing and parsimonious.

They both have very high stars on GitHub: 1.8K and 1.6K, respectively; however, there are a few reasons to choose pyparsing over parsimonious:

For these reasons, I propose using pyparsing to specify the Polyhedral term grammar and implement a corresponding parser.

iincer commented 1 year ago

Thanks for doing this research, @NicolasRouquette! Implementing our own parser sounds very attractive, particularly as we support additional syntax in the future. This could allow us to show more consistency to the user. Moving forward with this would be great. Do you think the time investment would be manageable?

NicolasRouquette commented 1 year ago

Creating the grammar to parse a generalized version of the above table is easy with pyparsing, which allows defining grammar in a similar way to a PEG parser.

If you are interested in following progress, see this branch: https://github.com/pacti-org/pacti/tree/issue-129

The remaining work involves the following:

The time investment is about a day of work.

iincer commented 1 year ago

This will be a great addition. Thanks, @NicolasRouquette!

NicolasRouquette commented 1 year ago

FYI: I generated the railroad diagrams:

https://github.com/pacti-org/pacti/blob/issue-129/docs/expression.html

Open this file in a browser; it should look like this:

image

Generating this diagram is, unfortunately, awkward; I tried to import the grammar from a separate file that would create the diagram, but this resulted in circular Python import errors. Also, the generated HTML file is missing a style that shows shapes with a different color than the background; without the style, we see black rounded rectangles without the internal details.

Despite these difficulties, using this diagram as a reference for the Polyhedral term grammar might be helpful.

NicolasRouquette commented 1 year ago

This commit concludes the first step described above.

The remaining step involves mapping the grammar.Expression intermediate representation to construct corresponding pacti.terms.polyhedra.PolyhedralContract.

NicolasRouquette commented 1 year ago

This commit is WIP for the last step.

The functionality for mapping grammar.Expression to polyhedra.PolyhedralContract is complete; however, 37 tests passed and 41 tests failed. This needs additional work...

NicolasRouquette commented 1 year ago

This commit fixes all unit tests; however, I realize that we need to review the mapping. The table is incomplete because it does not address the general case supported by the grammar, which, in a compact form, is as follows:

term = only_variable | number_and_variable | only_number
signed_term = pp.Optional(symbol, default="+") + term

terms = signed_term + pp.ZeroOrMore(symbol + term)

abs_term = pp.Optional(floating_point_number) + "|" + terms + "|"

abs_or_term = abs_term | term

equality_expression = abs_or_terms + equality_operator + abs_or_terms
leq_expression = abs_or_terms + pp.OneOrMore("<=" + abs_or_terms)
geq_expression = abs_or_terms + pp.OneOrMore(">=" + abs_or_terms)
expression = equality_expression | leq_expression | geq_expression

The mapping is defined in https://github.com/pacti-org/pacti/blob/issue-129/src/pacti/terms/polyhedra/serializer.py

There are some examples here: https://github.com/pacti-org/pacti/blob/issue-129/examples/grammar/from_string.py

NicolasRouquette commented 1 year ago

Based on a recent discussion/review with @iincer, we concluded that:

1) For documentation purposes, we will make a markdown file with just the grammar rules instead of the railroad diagrams.

2) For the grammar rules:

NicolasRouquette commented 1 year ago

The absolute value restriction may be too restrictive; consider this inequality:

|x+y| - |x-y| <= k

Geometrically, the solutions are in either 1 or 2 regions. For example:

image

image

image

We can calculate the number of solution regions using without creating a plot:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.label.html#scipy.ndimage.label

Here is an example:

import numpy as np
from scipy import ndimage

# Create a meshgrid for x and y values
x = np.linspace(-10, 10, 100)
y = np.linspace(-10, 10, 100)
X, Y = np.meshgrid(x, y)

k = -2

# Calculate the inequality
Z = np.abs(X + Y) - np.abs(X - Y) - k

# Create a new Z array for the colorscale
# 0 = green
# 1 = red
Z_color = np.where(Z <= 0, 0, 1)

# Label contiguous regions in the Z_color array
labeled_array, num_features = ndimage.label(Z_color == 0)

# Print the number of contiguous green regions
print("Number of contiguous green regions:", num_features)

This prints:

Number of contiguous green regions: 2

We could incorporate this test to ensure that the solutions to each PolyhedralTerm are within a single contiguous geometrical region. What do you think @iincer ?

iincer commented 1 year ago

I like the use of ndimage.label to get the regions :-) I agree that requiring the coefficients of absolute values to be nonnegative is restrictive. However, it is a way in which we are guaranteed get convex polyhedra from formulas. The examples you show give us nonconvex regions. This is equivalent to having specifications with logical ORs in them, which is currently not fully supported.

NicolasRouquette commented 1 year ago

Ah, if the requirement is a single convex region, then we should verify that the resulting set of linear inequalities yields a single convex region after expanding all absolute terms. We can use scipy.spatial.ConvexHull to iteratively keep track of the convexity of the inequalities as we add them one at a time.

iincer commented 1 year ago

I am pretty sure that requiring the nonnegativity of the coefficients multiplying the absolute values is sufficient, since the only place where non-convexity pops up is when we have |exp| >= exp2. This is evaluated to (exp >= exp2) or (exp <= -exp2).

NicolasRouquette commented 1 year ago

So it is OK to have something like this:

$$ \Sigma_i k_i| a_i X + b_i Y + c_i Z | <= d $$

for arbitrary values of $$a_i,b_i,c_i$$

as long as $$d>=0$$ and all $$k_i>=0$$

iincer commented 1 year ago

The $a_i$, $b_i$, $c_i$, and $d$ can be arbitrary. We only need $k_i \ge 0$.