Closed pipermerriam closed 5 years ago
Look into enabling doctest
Hey @pipermerriam ,
I did a quick scan - looks good. Here are some quick additional notes:
- convert to circle-ci
- tox
These would be great travis is super slow (and I don't have access).
- Create
Opcodes
Enum for easier referencing opcodes.
Would require all LLL statmements tho use the Opcodes Enum or have to have lookup method attached to it (all LLL function values are currently strings). Writing with strings make it easier to read / parse mentally - so I vote we keep strings in the LLL.
Takes multiple arguments like
sigs
,custom_units
,custom_structs
andconstants
. Current read of the code suggests there are data structures which have been defined in the code and thus, this gives the parser the ability to properly parse the AST into these types. If so, I believe this should be formalized into something akin to the WASMStore
to wrap these data structures up into a single wrapper for easier and less error prone portability.
Alot of those have been refactored to be contained on GlobalContext, and mostly can be refactored to pass global_ctx through instead, so check for that. However there are some base layer functions that will always require one to not pass them through, for example in some scenarios (building function signatures for interfaces don't have custom_units) you don't have global context populated, and in other cases you do.
Additionally, a lot of code in parser.py
and parser_utils.py
can be split of into sub modules (this has been my goal since I have started and have been trying to do so when possible). parser.py
should contain parsing steps of basically Module / Function & Context.
vyper.types.types.get_size_of_type
Relies on prior knowledge about types. This should probably be an API of the type classes.
vyper.types.types.has_dynamic_data
Relies on prior knowledge about types to determine. This should probably just be an API of the type classes.
What is meant by API of type classes, could you elaborate? I think having something like BaseType.get_size(unit='bytes'|''bits')
could work well.
Additional items you can add:
@pipermerriam I am currently working on a PR then I will be helping with these issues, since they potentially touch a lot of code. I'd prefer we get it all done as quickly as possible, to avoid any big branch diverges :smile:
@jacqueswww lets deal with that kind of thing on a case-by-case basis. Any of these that I deal will be done in an isolated way that touches the smallest possible section of code so if they do cause any conflicts I believe they should be reasonably easy to resolve and I'm happy to either work with you to rebase your branches or to rebase them for you in the event something messy occurs.
@pipermerriam works for me :+1: let's see how it goes, I checked the open PRs now, and I think we'll just deal with them as they come along.
vyper.utils
with eth-utils
utility functions.sha3
to keccak
across codebase.
- replace much of what is in
vyper.utils
witheth-utils
utility functions.
I'd prefer not to pull eth-utils for 1 function, as eth-utils pulls in 4 other dependencies of only one traces back to a crypto lib for a keccak implementaiton.
@jacqueswww Minor nitpicky correction that it's 3 dependencies since cytoolz/toolz
are functionally equivalent the way they are specified.
The eth-utils
dependencies are:
eth-hash
pycryptodome
locally and eth-utils
contains a keccak
function which is a rough equivalent to the existing vyper.utils.sha3
helper.eth-typing
toolz/cytoolz
Can you elaborate on what issues or problems you see with any of these or maybe expand on your reasoning for not wanting these to end up as dependencies.
@pipermerriam The first reason that would be that I tend to keep to stdlib as close as possible, more of a philosophical thing. It isn't that above libraries are not stable - but more about importing unnecessary dependencies (think left-pad saga).
The more pragmatic reasons are:
py-ethereum
#735. py-evm is off course much better split up - but I easily see conflicts still happening (and has happened to myself a couple of times e.g. vypers' eth-typing conflicts with the one I have setup for my current test suite).To add another point, we have a requirement to allow users to use the compiler with any version we have uploaded, either through a redeployment (docker, snapcraft, browser blob) or through version pining the library itself. This is so someone could test with any version of the compiler without running into dependency issues so they can validate code written with different versions.
Something to think about.
Here's my best stab at convincing you that the policy on dependencies should be more permissible, though I do fully agree that new dependencies should be evaluated thoroughly. This isn't an argument to say that eth-utils
should be allowed (we can debate that one separately), but an argument that we should not have a policy of no dependencies.
The best solution to conflict dependency hell is not having dependencies.
This feels a little like sticking our heads in the sand. We can have dependencies without having dependency hell. I don't think this quite qualifies as a false dichotomy but I disagree that not having dependencies is the "best solution". Even Django has dependencies these days. So does requests
.
Choosing to not have dependencies also means choosing to duplicate functionality that exists in other libraries where it is well tested, documented, and most importantly maintained as public APIs. The end result will be vyper accruing more and more vendored utility code, all of which will receive less maintenance, attention, tests, and bugfixes than it's equivalent from the libraries exposing it as public APIs. I'll take the occasional dependency issue over this maintenance nightmare any day. It's worth noting that eth-utils
exists for exactly this reason. Most everything in that library is there because it existed in multiple codebases and ended up suffering from things forgetting to apply bug fixes from one repository to all of the others.
I am absolutely on board with being very conservative with the dependencies we add, but I think a blanket policy is a bad idea.
In the past research team actually had direct conflicts with
py-ethereum
This seems to be roughly the same as the previous point, but to address it directly, we aren't in the pyethereum
days and haven't been for a long time. The current suite of python+ethereum tools is maintained and supported by a team of 8 full time developers and an even larger group of active third party contributors. I get that there is still scar tissue from the old days, but times have changed.
As for the issue about needing to vyper to be packaged in different ways, I'd like to objectively determine if adding these packages actually reduces our options there. Who did the js/wasm blob creation? Lets find out how we can have our cake and eat it too, since I suspect that we can have both (js/wasm portability and dependencies).
The requirement @fubuloubu mentions is important, but I also don't believe it's an argument to not have any dependencies, but rather one more thing to think about with respect to what we add and how we specify them. I am confident we can approach this in a way that allows us some freedom of movement when it comes to adding a dependency that has been thoroughly reviewed and evaluated to fit whatever our necessary criteria are for such things.
@pipermerriam I don't think that argument is that we shouldn't have dependencies. But that we shouldn't have dependencies that a.) will cause users of Vyper pip hell b.) pull in needless dependencies that we don't need, and that's why I am not keen on eth-utils
. eth-typing
will cause trouble in the future because every single py-evm related package will depened on it. I would more than willing to go with eth-utils
if it had no dependencies itself.
TBH I don't understand why we have to be DRY about strict typing structs (eth-typing
). What is wrong with having a Union or type alias in the codebase that uses them. mypy is basically a fancy linter and should be part of the test
requirements if really necessary. It just feels to me that to have a linter dictate the dependency graph of shipping code, is going to hurt. Basically I am of the opion that a linting step should not dictate dependency graphs. Replace mypy with flake8, and you'd be super confused why you can't install vyper because of flake8 (not talking tests, just the compiler), which will happen especially if you have a lot of developers working on the separate repos.
I do the wasm builds, and the currently build requires us to basically export the setup requirements and make sure they can build themselves, mostly it's not a problem but anything that does C compilation can be tricky, but all manageable - and if it becomes popular some better scripts can happen here. https://github.com/jacqueswww/vyper-in-browser/tree/master/pyodide
Edit: Ha, I see this uses pysha3
because pycryptodome
didn't compile. This perfect example of why having a very strict dependency policy saves you.
Edit2: So from my side I am on board with pulling in dependencies, but they have to be very strictly evaluated for the value they bring, for this specific case I really don't feel the upside merits the extra dependency weight. Me raising all of the above is how one has to be, to be conservative about dependencies - this comes from my personal experience (not only python specific, do you remember yum hell?). The cost of adding a dependency into your requirements / setup.py is unfairly low - to the potential downside it can cost in the long run.
Off dependency topic: I really like the suggested use case for ContextManager, and I will do PR for them soon :smile_cat:
A potential argument for eth-typing
: we should really type annotate our codebase. Will we require eth types to do so?
@fubuloubu I would argue not, everything in eth-typing
can be defined on their own terms, eth-typing
is a bunch of aliases. The dependency problem with it all is that you do an import statement to import the types - thereby forever binding your code to eth-typing
. This also means you have to know what is defined in the installed package to know what the alias means (for folks who don't have fancy IDEs). I am much much more in favour of each repository like vyper use it's own defined type names, and even better to strive for standard annotations as much as possible.
Everything in pycryptodome
could be defined on its own too, it's not a complicated library and really we're only using 2-ish functions from it. Should we eschew all dependencies and start maintaining all functions internally?
@fubuloubu the difference is we actually use a function there. before:
vyper ->keccack lib
after:
vyper -> eth_utils -> eth_hash -> keccack lib
-> ctoolz/toolz etc. (still not sure why there are 2 equivalent libraries here?)
-> eth typing
eth-typing
is by no means required for type hinting. It's just a collection of common aliases. I'm fine leaving it out.
eth-utils
is by no means required and I'm more than fine tabling it.
As of now, I don't have any dependencies that I'm going to make a case for adding and I'm glad that we appear to be on roughly the same page and I'm fine being extra conservative about adding dependencies.
@jacqueswww the toolz/cytoolz
auto negotiates install based on the base python architecture since cytoolz
doesn't play nice with pypy
but is significantly faster than toolz
when we're in an environment that supports c-extensions.
@pipermerriam ah I see! it's so it's so that py-evm can run on pypy
make sense (and speed boost) :)
Would you perhaps review my PR, https://github.com/ethereum/vyper/pull/1269, I am a bit new to context managers and I'd appreciate your feedback (when you have a chance).
@pipermerriam also is it ok if I put tickboxes on the above ticket so we can keep track what has been done?
tests/parser/syntax/test_no_none.py
@pipermerriam that may be a tall order, we have a bunch of programs, both for functional testing and for compilation checks. I am open to this though, would just require some special setup folder or something.
Our larger examples are standalone files.
print(..)
statements in tests to use logging
so that output can be throttled and controlled using standard pytest CLI apis.assert_compile_failed
with standard with pytest.raises(...)
style failure checks.
- [ ] convert
print(..)
statements in tests to uselogging
so that output can be throttled and controlled using standard pytest CLI apis.
I think we should just remove those? I have never used them, and the test file name / test structure should inform the user of what's being testing in a specific file, or alternatively use a comment - don't see why using stdout doesn't provide the same but w/o print statements. I think it was maybe when all tests was in one big file?
So I'd just move those to top level comments, as they are still useful in terms of knowing more about what is being tested.
Continuing off https://github.com/ethereum/vyper/pull/1351#discussion_r266033074 I think we should a) have a CompilerPanic
exception to replace asserts which suggests to the user to file a bug report, and b) catch all exceptions that aren't Vyper exceptions (e.g. ast.NameError, ValueError, etc.) at the top level and turn them into CompilerPanic. This is to help the user experience.
Idea for doctest:
@charles-cooper I will include this in the square root PR, since the PR is turning into a meta PR anyways.
@pipermerriam you have this stale branch, which I think relates to this issue. Can we delete that branch (issue is still valid)?
Also would like to walk through this issue and see if anything is still relevant to be cleaned up.
Branch can be deleted and this can be closed. If I find myself back here working on stuff I'll re-open or make a new issue.
This is a tracking issue that I'm using to dump my thoughts on things that can be cleaned up in this codebase.
General
enum
instead of strings fortyp
values.vyper.types.types
functools.wraps
for decoratorsException
Opcodes
Enum for easier referencing opcodes.decimal.Decimal
?location == "memory"
should be an Enum.sha3
tokeccak
across codebase.Specifics
vyper.type.types.NodeType.__eq__
change
NodeType.__eq__
to require subclasses to implement and removeeq
hook.vyper.types.types.parse_type
Takes multiple arguments like
sigs
,custom_units
,custom_structs
andconstants
. Current read of the code suggests there are data structures which have been defined in the code and thus, this gives the parser the ability to properly parse the AST into these types. If so, I believe this should be formalized into something akin to the WASMStore
to wrap these data structures up into a single wrapper for easier and less error prone portability.vyper.types.types.StructType
Takes a mapping of
field_name -> field_type
formembers
argument. This relies on dictionary ordering (which is admittedly now reliable) but also is a more difficuult data structure to reliably type hint. Consider using an iterable of 2-tuples:(('field_a', TypeA), ('field_b', TypeB))
. Benefits are easier type safety through type hinting. Cons are additional validation step to ensure field uniqueness.vyper.types.types.get_size_of_type
Relies on prior knowledge about types. This should probably be an API of the type classes.
vyper.types.types.has_dynamic_data
Relies on prior knowledge about types to determine. This should probably just be an API of the type classes.
vyper.functions.signature.signature
Uses non-canonical
argz
andkwargz
withz
suffixes. Convert to normal*args, **kwargs
Inner
decorator
function isn't usingfunctools.wraps
.More usage of magic strings where an
Enum
is probably better.vyper.opcodes
Replace use of
list
withNamedTuple
.vyper.compiler.gas_estimate
Uses runtime
assert
statement which is easily disabled. Change to explicit exception raising.vyper.compiler.mk_full_signature
Loop uses enumerate to generate indices
idx
used to modifyabi
at that indices. Instead, usezip
to pair functions with their abi definitions.vyper.compiler.get_source_map
Builds dict using mutation. Vendor a version of
to_dict
decorator to remove mutation pattern.vyper.compiler.compile_codes
Uses mutable data structure as default for argument
Use of magic strings for
output_type
should be constrained to enums.vyper.compiler.compile_code
Uses mutable data structure as default for argument
vyper.compiler.get_asm
constant string mutation (which actual ends up as full object copies) adds overhead. Converting to a wrapped generator pattern that joins all of the produced strings at the end should be more performant and remove mutation.
vyper.compile_lll
Use of
next_symbol
global counter appears to be used to create a unique marker in the code such as_sym_0
,_sym_1
. Replace withitertools.count()
. Replace withuuid.uuid4()
. Replace with generator function that wraps anitertools.count()
style counter. Embed within an object with a defined lifecycle that gets passed around during compilation.vyper.compile_lll.is_symbol
Replace use of magic string with something more concrete like a custom class. Potentially the custom class could handle the mechanism for generating unique symbols.
vyper.compile_lll.instruction
instruction
is a class and thus should be named using common conventionInstruction
.vyper.compile_lll.apply_line_numbers
Needs to use
functools.wraps
for inner function.vyper.compile_lll.compile_to_assembly
Convert runtime
assert
statements to raise proper exceptions.vyper.compile_lll.compile_to_assembly
Detection of whether something is or is not an opcode is done via
isinstance(value, str) and value in opcodes
. If we had anOpcodes
Enum this could be as simple as checking ifOpcodes(value)
raised an exception or not.vyper.compile_lll.compile_to_assembly
Use of inlined opcode string values like
"PUSH"
would be better as Enum or constant values.vyper.compile_lll.compile_to_assembly
Individual blocks in the
if/elif/.../else
block could be partitioned off into smaller standalone functions.vyper.compile_lll.compile_to_assembly
The section that handles clamping values has an
if/elif
which is missing a finalelse
clause allowing for undefined behavior if none of the conditions evaluates truthyvyper.compile_lll.assembly_to_evm
Use formal data structure for
line_number_map
instead of dictionary.vyper.compile_lll.assembly_to_evm
Inside of the
if isinstance(item, list)
, the local valueline_number_map
is overwritten which appears to be a potentially destructive mistake as it will destroy any state that was previously built up locally.vyper.optimizer.get_int_at
Uses inline logic for converting a value to a signed integer. Should be extracted into utility that is tested.
vyper.util.is_instances
Confusingly named. Very visually similar to
isinstance
. Maybe rename to aall_is_instances
vyper.parser.constants.Constants.unroll_constant
Use of
fail
flag is antipattern. Restructure to allow inlined exception raising.vyper.parser.context.Context.start_blockscope
and othersThe various function pairs like
start_blockscope/end_blockscope
,set_in_assertion
,set_in_for_loop/remove_in_for_loop
should be converted to use context manager pattern so that exiting the scope cannot be accidentally forgotten.vyper.parser.global_ctx.GlobalContext.get_item_name_and_attributes
Returns
None
for item name in final block which suggests there is either a case where something does not have a name or that there is return value checking occuring somewhere.vyper.parser.stmt.Statements._check_valid_assign
Does not have a final
else
block allowing function to exit if none of the conditions evaluate truthy.vyper.parser.stmt.Statements.parse_for
Uses implicit
else
for final clause type. Should be converted to explicitely check that the statement is formed as expected and add a finalelse
block that raises an invariant exception.