verygoodsecurity / starlarky

VGS edition of Google's safe and hermetically sealed Starlark language - a non-Turing complete subset of Python 3.
https://vgs.dev
Apache License 2.0
28 stars 38 forks source link

stdlib migration strategy #62

Open mjallday opened 3 years ago

mjallday commented 3 years ago

We have a set of common functionality we need to implement in order to have parity with our customers in production today that we must implement.

What to migrate

Most of these immediate requirements are already codified in the FCO framework here - https://github.com/verygoodsecurity/fco-catalogue/tree/master/github.com/verygoodsecurity/common/utils

Initially, we do not need to implement the entire library, only the functionality used by customers. However, we should still implement the entire interface and leave the methods unimplemented so we have a clear view of what needs to be implemented later. This can be done by adding empty methods that raise an exception when called.

How to migrate

we can use java implementation but are looking to provide python behavior. we achieve this by implementing a BDD/TDD approach where the python tests are lifted from the official python library to enable this.

Each lib must be done as a separate pull-request. It must have tests implemented before any logic is migrated.

We need to implement the larky equivalent of these methods. Our approach is to implement the same method signature that the python stdlib provides. For example, for base64 we would implement base64.b64encode(s, altchars=None) and it should behave the same way that the python3 base64 library does.

In order to achieve this outcome we will need to assert that our functionality is identical. We can do this by lifting the test suite from the cpython implementation of the library e.g. for base64 we would implement the fixtures and assertions from https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Lib/test/test_base64.py

Migrate the interface first

For this purpose we have created https://github.com/mahmoudimus/py2star which has a tool which will take a python file and convert it into an empty starlarky file which can serve as the base for the migration

Given the cpython implementation of base64 as an input it should output something like https://github.com/verygoodsecurity/starlarky/blob/22bf9fdcf70cc3b6a7d1ce91e405280f7a67c494/larky/src/main/resources/stdlib/base64.star

This file should provide the interface and allow us to implement a larky native version of the required functionality. We should not change the interface.

We should have a pull-request created once this step is completed.

Migrate tests second

The tests will need to be rewritten from the cpython version but the asserts should not change. E.g. self.assertTypedEqual should become asserts.assert_that

We should find the test file in cpython and migrate the asserts across. This may require a small tool similar to tokenize_signature.py.

We should have additional commits on our pull-request once this is completed with all the tests failing.

Implement the logic third

Finally, once we have our interface implemented and our tests in place we should have a pull-request with failing tests running. We can then begin migrating larky code across. For any logic that we choose not to implement we should ignore the test. We should not ignore a test if it invokes code a customer uses, only methods that we do not yet use (e.g. a85decode may not be used by a customer so it can be ignored).

At the end of the migration the pull-request should have all tests passing or ignored and the pull-request should be green.

Ignore below, it's just notes
gfind $(pwd) -type d -iwholename "*/*test" -prune -o -type f -name "*.py" | \
  grep -v SelfTest | \
  gxargs -I {} env VAR="{}" /bin/sh -c 'out=$(echo $VAR | sed s/Crypto/LCrypto/g | sed s/.py$/.star/g); mkdir -p $(dirname $out); echo "Running: $out"; python ~/src/sbxscripting/bazelbuild/py2star/src/py2star/cli.py $VAR > $out'
mjallday commented 3 years ago

libraries to migrate

rough order of priorities

  1. json
  2. form data parsing
  3. xml
  4. regex
  5. base64
  6. hex
  7. datetime
  8. rest of parsing
  9. rest of encoding
  10. rest of encryption

encoding

crypto (signing, encryption, hashing)

date/time

regex

parsing

mjallday commented 3 years ago

Many of these do not need us to start from scratch. We can create individual pull-requests from https://github.com/verygoodsecurity/starlarky/pull/63 as a starting point. We can pull each item out, create the pr and then merge one at a time.

mahmoudimus commented 3 years ago

@mjallday I think I did the regex one already.

Agitolyev commented 3 years ago

We also need to add bytes type to the list, there is an official specification for it, but I'm not sure when it will be implemented.

mahmoudimus commented 3 years ago

@Agitolyev added in #71 -- please see https://github.com/verygoodsecurity/starlarky/blob/stdlib-ga/larky/src/test/resources/stdlib_tests/test_bytes.star on how to use.