vyperlang / vyper

Pythonic Smart Contract Language for the EVM
https://vyperlang.org
Other
4.89k stars 800 forks source link

Meta Issue: Tracker for codebase cleanup #1259

Closed pipermerriam closed 5 years ago

pipermerriam commented 5 years ago

This is a tracking issue that I'm using to dump my thoughts on things that can be cleaned up in this codebase.

General

Specifics

change NodeType.__eq__ to require subclasses to implement and remove eq hook.

Takes multiple arguments like sigs, custom_units, custom_structs and constants. Current read of the code suggests there are data structures which have been defined in the code and thus, this gives the parser the ability to properly parse the AST into these types. If so, I believe this should be formalized into something akin to the WASM Store to wrap these data structures up into a single wrapper for easier and less error prone portability.

Takes a mapping of field_name -> field_type for members argument. This relies on dictionary ordering (which is admittedly now reliable) but also is a more difficuult data structure to reliably type hint. Consider using an iterable of 2-tuples: (('field_a', TypeA), ('field_b', TypeB)). Benefits are easier type safety through type hinting. Cons are additional validation step to ensure field uniqueness.

Relies on prior knowledge about types. This should probably be an API of the type classes.

Relies on prior knowledge about types to determine. This should probably just be an API of the type classes.

Uses non-canonical argz and kwargz with z suffixes. Convert to normal *args, **kwargs

Inner decorator function isn't using functools.wraps.

More usage of magic strings where an Enum is probably better.

Replace use of list with NamedTuple.

Uses runtime assert statement which is easily disabled. Change to explicit exception raising.

Loop uses enumerate to generate indices idx used to modify abi at that indices. Instead, use zip to pair functions with their abi definitions.

Builds dict using mutation. Vendor a version of to_dict decorator to remove mutation pattern.

Uses mutable data structure as default for argument

Use of magic strings for output_type should be constrained to enums.

Uses mutable data structure as default for argument

constant string mutation (which actual ends up as full object copies) adds overhead. Converting to a wrapped generator pattern that joins all of the produced strings at the end should be more performant and remove mutation.

Use of next_symbol global counter appears to be used to create a unique marker in the code such as _sym_0, _sym_1. Replace with itertools.count(). Replace with uuid.uuid4(). Replace with generator function that wraps an itertools.count() style counter. Embed within an object with a defined lifecycle that gets passed around during compilation.

Replace use of magic string with something more concrete like a custom class. Potentially the custom class could handle the mechanism for generating unique symbols.

instruction is a class and thus should be named using common convention Instruction.

Needs to use functools.wraps for inner function.

Convert runtime assert statements to raise proper exceptions.

Detection of whether something is or is not an opcode is done via isinstance(value, str) and value in opcodes. If we had an Opcodes Enum this could be as simple as checking if Opcodes(value) raised an exception or not.

Use of inlined opcode string values like "PUSH" would be better as Enum or constant values.

Individual blocks in the if/elif/.../else block could be partitioned off into smaller standalone functions.

The section that handles clamping values has an if/elif which is missing a final else clause allowing for undefined behavior if none of the conditions evaluates truthy

Use formal data structure for line_number_map instead of dictionary.

Inside of the if isinstance(item, list), the local value line_number_map is overwritten which appears to be a potentially destructive mistake as it will destroy any state that was previously built up locally.

Uses inline logic for converting a value to a signed integer. Should be extracted into utility that is tested.

Confusingly named. Very visually similar to isinstance. Maybe rename to a all_is_instances

Use of fail flag is antipattern. Restructure to allow inlined exception raising.

The various function pairs like start_blockscope/end_blockscope, set_in_assertion, set_in_for_loop/remove_in_for_loop should be converted to use context manager pattern so that exiting the scope cannot be accidentally forgotten.

Returns None for item name in final block which suggests there is either a case where something does not have a name or that there is return value checking occuring somewhere.

Does not have a final else block allowing function to exit if none of the conditions evaluate truthy.

Uses implicit else for final clause type. Should be converted to explicitely check that the statement is formed as expected and add a final else block that raises an invariant exception.

pipermerriam commented 5 years ago

Look into enabling doctest

jacqueswww commented 5 years ago

Hey @pipermerriam ,

I did a quick scan - looks good. Here are some quick additional notes:

  • convert to circle-ci
  • tox

These would be great travis is super slow (and I don't have access).

  • Create Opcodes Enum for easier referencing opcodes.

Would require all LLL statmements tho use the Opcodes Enum or have to have lookup method attached to it (all LLL function values are currently strings). Writing with strings make it easier to read / parse mentally - so I vote we keep strings in the LLL.

Takes multiple arguments like sigs, custom_units, custom_structs and constants. Current read of the code suggests there are data structures which have been defined in the code and thus, this gives the parser the ability to properly parse the AST into these types. If so, I believe this should be formalized into something akin to the WASM Store to wrap these data structures up into a single wrapper for easier and less error prone portability.

Alot of those have been refactored to be contained on GlobalContext, and mostly can be refactored to pass global_ctx through instead, so check for that. However there are some base layer functions that will always require one to not pass them through, for example in some scenarios (building function signatures for interfaces don't have custom_units) you don't have global context populated, and in other cases you do.

Additionally, a lot of code in parser.py and parser_utils.py can be split of into sub modules (this has been my goal since I have started and have been trying to do so when possible). parser.py should contain parsing steps of basically Module / Function & Context.

  • vyper.types.types.get_size_of_type

Relies on prior knowledge about types. This should probably be an API of the type classes.

  • vyper.types.types.has_dynamic_data Relies on prior knowledge about types to determine. This should probably just be an API of the type classes.

What is meant by API of type classes, could you elaborate? I think having something like BaseType.get_size(unit='bytes'|''bits') could work well.

Additional items you can add:

jacqueswww commented 5 years ago

@pipermerriam I am currently working on a PR then I will be helping with these issues, since they potentially touch a lot of code. I'd prefer we get it all done as quickly as possible, to avoid any big branch diverges :smile:

pipermerriam commented 5 years ago

@jacqueswww lets deal with that kind of thing on a case-by-case basis. Any of these that I deal will be done in an isolated way that touches the smallest possible section of code so if they do cause any conflicts I believe they should be reasonably easy to resolve and I'm happy to either work with you to rebase your branches or to rebase them for you in the event something messy occurs.

jacqueswww commented 5 years ago

@pipermerriam works for me :+1: let's see how it goes, I checked the open PRs now, and I think we'll just deal with them as they come along.

pipermerriam commented 5 years ago
jacqueswww commented 5 years ago
  • replace much of what is in vyper.utils with eth-utils utility functions.

I'd prefer not to pull eth-utils for 1 function, as eth-utils pulls in 4 other dependencies of only one traces back to a crypto lib for a keccak implementaiton.

pipermerriam commented 5 years ago

@jacqueswww Minor nitpicky correction that it's 3 dependencies since cytoolz/toolz are functionally equivalent the way they are specified.

The eth-utils dependencies are:

Can you elaborate on what issues or problems you see with any of these or maybe expand on your reasoning for not wanting these to end up as dependencies.

jacqueswww commented 5 years ago

@pipermerriam The first reason that would be that I tend to keep to stdlib as close as possible, more of a philosophical thing. It isn't that above libraries are not stable - but more about importing unnecessary dependencies (think left-pad saga).

The more pragmatic reasons are:

fubuloubu commented 5 years ago

To add another point, we have a requirement to allow users to use the compiler with any version we have uploaded, either through a redeployment (docker, snapcraft, browser blob) or through version pining the library itself. This is so someone could test with any version of the compiler without running into dependency issues so they can validate code written with different versions.

Something to think about.

pipermerriam commented 5 years ago

Here's my best stab at convincing you that the policy on dependencies should be more permissible, though I do fully agree that new dependencies should be evaluated thoroughly. This isn't an argument to say that eth-utils should be allowed (we can debate that one separately), but an argument that we should not have a policy of no dependencies.

The best solution to conflict dependency hell is not having dependencies.

This feels a little like sticking our heads in the sand. We can have dependencies without having dependency hell. I don't think this quite qualifies as a false dichotomy but I disagree that not having dependencies is the "best solution". Even Django has dependencies these days. So does requests.

Choosing to not have dependencies also means choosing to duplicate functionality that exists in other libraries where it is well tested, documented, and most importantly maintained as public APIs. The end result will be vyper accruing more and more vendored utility code, all of which will receive less maintenance, attention, tests, and bugfixes than it's equivalent from the libraries exposing it as public APIs. I'll take the occasional dependency issue over this maintenance nightmare any day. It's worth noting that eth-utils exists for exactly this reason. Most everything in that library is there because it existed in multiple codebases and ended up suffering from things forgetting to apply bug fixes from one repository to all of the others.

I am absolutely on board with being very conservative with the dependencies we add, but I think a blanket policy is a bad idea.

In the past research team actually had direct conflicts with py-ethereum

This seems to be roughly the same as the previous point, but to address it directly, we aren't in the pyethereum days and haven't been for a long time. The current suite of python+ethereum tools is maintained and supported by a team of 8 full time developers and an even larger group of active third party contributors. I get that there is still scar tissue from the old days, but times have changed.


As for the issue about needing to vyper to be packaged in different ways, I'd like to objectively determine if adding these packages actually reduces our options there. Who did the js/wasm blob creation? Lets find out how we can have our cake and eat it too, since I suspect that we can have both (js/wasm portability and dependencies).

pipermerriam commented 5 years ago

The requirement @fubuloubu mentions is important, but I also don't believe it's an argument to not have any dependencies, but rather one more thing to think about with respect to what we add and how we specify them. I am confident we can approach this in a way that allows us some freedom of movement when it comes to adding a dependency that has been thoroughly reviewed and evaluated to fit whatever our necessary criteria are for such things.

jacqueswww commented 5 years ago

@pipermerriam I don't think that argument is that we shouldn't have dependencies. But that we shouldn't have dependencies that a.) will cause users of Vyper pip hell b.) pull in needless dependencies that we don't need, and that's why I am not keen on eth-utils. eth-typing will cause trouble in the future because every single py-evm related package will depened on it. I would more than willing to go with eth-utils if it had no dependencies itself.

TBH I don't understand why we have to be DRY about strict typing structs (eth-typing). What is wrong with having a Union or type alias in the codebase that uses them. mypy is basically a fancy linter and should be part of the test requirements if really necessary. It just feels to me that to have a linter dictate the dependency graph of shipping code, is going to hurt. Basically I am of the opion that a linting step should not dictate dependency graphs. Replace mypy with flake8, and you'd be super confused why you can't install vyper because of flake8 (not talking tests, just the compiler), which will happen especially if you have a lot of developers working on the separate repos.

I do the wasm builds, and the currently build requires us to basically export the setup requirements and make sure they can build themselves, mostly it's not a problem but anything that does C compilation can be tricky, but all manageable - and if it becomes popular some better scripts can happen here. https://github.com/jacqueswww/vyper-in-browser/tree/master/pyodide

Edit: Ha, I see this uses pysha3 because pycryptodome didn't compile. This perfect example of why having a very strict dependency policy saves you.

Edit2: So from my side I am on board with pulling in dependencies, but they have to be very strictly evaluated for the value they bring, for this specific case I really don't feel the upside merits the extra dependency weight. Me raising all of the above is how one has to be, to be conservative about dependencies - this comes from my personal experience (not only python specific, do you remember yum hell?). The cost of adding a dependency into your requirements / setup.py is unfairly low - to the potential downside it can cost in the long run.

jacqueswww commented 5 years ago

Off dependency topic: I really like the suggested use case for ContextManager, and I will do PR for them soon :smile_cat:

fubuloubu commented 5 years ago

A potential argument for eth-typing: we should really type annotate our codebase. Will we require eth types to do so?

jacqueswww commented 5 years ago

@fubuloubu I would argue not, everything in eth-typing can be defined on their own terms, eth-typing is a bunch of aliases. The dependency problem with it all is that you do an import statement to import the types - thereby forever binding your code to eth-typing. This also means you have to know what is defined in the installed package to know what the alias means (for folks who don't have fancy IDEs). I am much much more in favour of each repository like vyper use it's own defined type names, and even better to strive for standard annotations as much as possible.

fubuloubu commented 5 years ago

Everything in pycryptodome could be defined on its own too, it's not a complicated library and really we're only using 2-ish functions from it. Should we eschew all dependencies and start maintaining all functions internally?

jacqueswww commented 5 years ago

@fubuloubu the difference is we actually use a function there. before:

vyper ->keccack lib

after:

vyper -> eth_utils -> eth_hash -> keccack lib
                            -> ctoolz/toolz etc. (still not sure why there are 2 equivalent libraries here?)
                            -> eth typing
pipermerriam commented 5 years ago

eth-typingis by no means required for type hinting. It's just a collection of common aliases. I'm fine leaving it out.

eth-utils is by no means required and I'm more than fine tabling it.

As of now, I don't have any dependencies that I'm going to make a case for adding and I'm glad that we appear to be on roughly the same page and I'm fine being extra conservative about adding dependencies.

pipermerriam commented 5 years ago

@jacqueswww the toolz/cytoolz auto negotiates install based on the base python architecture since cytoolz doesn't play nice with pypy but is significantly faster than toolz when we're in an environment that supports c-extensions.

jacqueswww commented 5 years ago

@pipermerriam ah I see! it's so it's so that py-evm can run on pypy make sense (and speed boost) :) Would you perhaps review my PR, https://github.com/ethereum/vyper/pull/1269, I am a bit new to context managers and I'd appreciate your feedback (when you have a chance).

jacqueswww commented 5 years ago

@pipermerriam also is it ok if I put tickboxes on the above ticket so we can keep track what has been done?

pipermerriam commented 5 years ago
fubuloubu commented 5 years ago

@pipermerriam that may be a tall order, we have a bunch of programs, both for functional testing and for compilation checks. I am open to this though, would just require some special setup folder or something.

Our larger examples are standalone files.

pipermerriam commented 5 years ago
pipermerriam commented 5 years ago
jacqueswww commented 5 years ago
  • [ ] convert print(..) statements in tests to use logging so that output can be throttled and controlled using standard pytest CLI apis.

I think we should just remove those? I have never used them, and the test file name / test structure should inform the user of what's being testing in a specific file, or alternatively use a comment - don't see why using stdout doesn't provide the same but w/o print statements. I think it was maybe when all tests was in one big file?

So I'd just move those to top level comments, as they are still useful in terms of knowing more about what is being tested.

charles-cooper commented 5 years ago

Continuing off https://github.com/ethereum/vyper/pull/1351#discussion_r266033074 I think we should a) have a CompilerPanic exception to replace asserts which suggests to the user to file a bug report, and b) catch all exceptions that aren't Vyper exceptions (e.g. ast.NameError, ValueError, etc.) at the top level and turn them into CompilerPanic. This is to help the user experience.

fubuloubu commented 5 years ago

Idea for doctest:

  1. Create a few helper templates that encapsulate example Vyper code in a vyper contract and then compiles it.
  2. Add some testing scaffolding that we can show the functionality of the code with in-doc examples.
jacqueswww commented 5 years ago

@charles-cooper I will include this in the square root PR, since the PR is turning into a meta PR anyways.

fubuloubu commented 5 years ago

@pipermerriam you have this stale branch, which I think relates to this issue. Can we delete that branch (issue is still valid)?

Also would like to walk through this issue and see if anything is still relevant to be cleaned up.

pipermerriam commented 5 years ago

Branch can be deleted and this can be closed. If I find myself back here working on stuff I'll re-open or make a new issue.