Symbolic string method support

pschanely commented 3 years ago

Today, many string methods cause symbolic strings to be "materialized" (the symbolic string becomes a concrete string and then gets handed to the native string implementation).

For some of these operations, this is likely the best we can do, but others may be implementable with the SMT solver directly. This is a list of those methods, and we'll check them off when we've decided that we have the implementation we want. For each method, we'll decide to:

Leave as-is, with materialization in place.
Implement the method symbolically, using Z3.
Implement the method in terms of other string methods that have symbolic implementations.

Note further that direct unicode support in Z3 isn't yet implemented. We might want to defer spending time on this until that's resolved or until we implement a workaround (as a sequence of bitvectors ... or a massive enum?)

[x] def capitalize(self)
[x] def casefold(self)
[x] def center(self, width, *args)
[x] def count(self, sub, start=0, end=sys.maxsize)
[x] def encode(self, encoding=_MISSING, errors=_MISSING)
[x] def endswith(self, suffix, start=0, end=sys.maxsize)
[x] def expandtabs(self, tabsize=8)
[x] def find(self, sub, start=0, end=sys.maxsize)
[x] def format(self, *args, **kwds)
[x] def format_map(self, mapping)
[x] def index(self, sub, start=0, end=sys.maxsize)
[x] def isalpha(self)
[x] def isalnum(self)
[x] def isascii(self)
[x] def isdecimal(self)
[x] def isdigit(self)
[x] def isidentifier(self)
[x] def islower(self)
[x] def isnumeric(self)
[x] def isprintable(self)
[x] def isspace(self)
[x] def istitle(self)
[x] def isupper(self)
[x] def join(self, seq)
[x] def ljust(self, width, *args)
[x] def lower(self): return self.data.lower()
[x] def lstrip(self, chars=None)
[x] def partition(self, sep)
[x] def replace(self, old, new, maxsplit=-1)
[x] def rfind(self, sub, start=0, end=sys.maxsize)
[x] def rindex(self, sub, start=0, end=sys.maxsize)
[x] def rjust(self, width, *args)
[x] def rpartition(self, sep)
[x] def rsplit(self, sep=None, maxsplit=-1)
[x] def rstrip(self, chars=None)
[x] def split(self, sep=None, maxsplit=-1)
[x] def splitlines(self, keepends=False)
[x] def startswith(self, prefix, start=0, end=sys.maxsize)
[x] def strip(self, chars=None)
[x] def swapcase(self)
[x] def title(self)
[x] def translate(self, *args)
[x] def upper(self)
[x] def zfill(self, width)
[x] def removeprefix(self, prefix: str)
[x] def removesuffix(self, suffix: str)

davidmrdavid commented 3 years ago

Hey @pschanely, I've recently been meaning to get involved with automated reasoning projects for Python and so this project stood out to me. I also can't help but to notice this ticket had a good first issue tag, so I'd be interested in contributing if I could get a bit of hand-holding while ramping up. Otherwise, I don't mean to bother, just let me know :)

Looking at this issue, I'm seeing that a lot (if not all) of these methods live in this class:

https://github.com/pschanely/CrossHair/blob/dcc25264c54538407f146d32c4e829936826bfd5/crosshair/abcstring.py#L14-L32

but their usage, within the library, is unclear to me. I don't believe users of CrossHair directly interact with these interfaces, right? Could I get some some context around the usage of these methods and how they currently interact with users and with the SMT solver? Some links to code files would be helpful! Thanks!

pschanely commented 3 years ago

Oh fantastic, and thank you!

AbcString just helps us detect when we hit operations that we don't support; the actual class you want is crosshair.libimpl.builtinslib.SmtStr (which extends AbcString!).

I've started a debugging guide with str.zfill as an example (which might be a good one for you to try fixing!). Maybe see if you can reproduce that to start.

Then, maybe take a look at the StringsTest tests in builtinslib_test.py. Maybe we start with a test like:

    def test_zfill_fail(self) -> None:
        def f(s: str) -> str:
            ''' post: __return__ != "00012"'''
            return s.zfill(5)
        self.assertEqual(*check_fail(f))

check_fail means we expect a counterexample. (but we don't get one until you fix SmtStr somehow!) There are a ton of tests here, so you'll want to run your test individually; something like:

python crosshair/libimpl/builtinslib_test.py StringsTest.test_zfill_fail -v

I suspect you can fix this by re-implementing zfill() on SmtStr in terms of other string operations that already work, like string concatenation ("a" + "b") and string repetition ("a" * 3). Other string methods might require you to dig into the SMT/z3 stuff more directly. Feel free to reach out when you're ready to implement.

When you're ready to run all the tests (this will take a long time), run python -m unittest discover -p '*_test.py'.

davidmrdavid commented 3 years ago

Thanks for the detailed response! I'll be taking a stab at this in the coming days, but will most likely not find a large block of time until Sunday. I'll try to keep this thread updated ⚡

pschanely commented 3 years ago

Reopening. Mistyped a bug #!

pschanely commented 3 years ago

CrossHair now fully supports symbolic unicode strings. I'll be investing more in these methods presently (plus more complete regex support in #118).

pschanely commented 2 years ago

To be clear, the LEVEL of symbolic support may naturally vary. But everything has at least some symbolic support now, so I'm going to close this issue. Strings are done; huzzah!

pschanely / CrossHair

Symbolic string method support #39