Closed benjamincburns closed 5 years ago
This issue now has a funding of 0.5 ETH (374.81 USD) attached to it.
happy new year! im going to spend some cycles this week sourcing someone to take this on..
FYI, https://github.com/ethereumjs/merkle-patricia-tree/pull/28 looks like a promising fix for TypeError: Cannot read property 'get' of undefined
I dug a little deeper into TypeError: Cannot read property 'pop' of undefined
. It happens during ethereumjs-vm runTx when it flushes the cache to its trie after a couple of checkpoints, commit, and revert.
Hi, I am the author of https://github.com/ethereumjs/merkle-patricia-tree/pull/28.
I also looked at the pop error and I think it's related to reading a non-existent trie node (maybe due to some confusion about the currently active DB).
IIRC, the problematic call stack goes like this:
// baseTrie.js
put
findPath
_walkTrie
_lookupNode
_walkTrie.processNode
findPath.processNode
_updateNode
Where _updateNode
is then called with an undefined value for stack
.
If you look at the _walkTrie
function, you will find that the child processNode
function calls cb without arguments if it does not receive a node (!node
).
cb then does onDone.apply(null, returnValues)
with an empty returnValues
array (because walkController.return
was never called and thus returnValues
never reassigned in this case).
Thus, stack
in _updateNode
ends up undefined.
Also interesting to look at are the conditions under which _lookupNode
(called from _walkTrie
) returns a null value.
@federicobond your analysis of the call stack is similar to what I observed.
but I don't think the root cause can be fixed in merkle-patricia-tree.
I think the root cause pertains to ethereumjs-vm working on its cache of statemanager while in checkpoint mode, then when it tries to persist its cache after certain revert actions, there is a reference to a node that never existed in the db, hence your call stack occurs.
That makes sense. I'll see if I find anything else.
The funding of 0.5 ETH (427.59 USD) attached has been claimed by @kammerdiener.
@kammerdiener, please leave a comment to let the funder (@owocki) and the other parties involved your implementation plan. If you don't leave a comment, the funder may expire your claim at their discretion.
So a quick update on this. Based on what I have found I think that both issues are currently originating from ethereumjs/merkle-patricia-tree. I have fixed the get being undefined. Currently working on figuring out what is going on with pop coming back undefined.
Believe I have fixed it all within ethereumjs/merkle-patricia-tree. The Pull Request is here: PR #30. I tried to take into account all of the information talked about from above and was sure to check @federicobond PR to help fix the _getDBs portion. All of the tests pass now with no errors.
I went into the node_modules folder and replaced the items and am attaching a screenshot of it passing the race conditions test. Once the PR for Merkle Patricia Trees is merged then. Should I go ahead and submit a PR for this with it including the race conditions test (removing the skip command)?
@kammerdiener there are actually a couple of different competing submissions at this point. I'll be taking some time early this week to evaluate them. I realize now that I said that the goal was to make the tests pass, but the real goal is to fix #417. The best thing that you can do at this point is to manually test ganache-cli with your fix in place (you can use npm link to do this) and determine whether you observe either of the crashes identified in #417. This is how I'll be evaluating the proposed fixes as well, so it'll take a bit of time, I'm afraid.
@roderik by any chance would you like to help with evaluating these fixes, as I believe you've been impacted by this bug quite regularly?
@kammerdiener, as I feared, when testing your fix I observe a new issue. I believe this is due to what @0xNPE @federicobond were discussing above. That said, your changes do make ganache much more stable, so until a proper fix is found we will likely include this fix in future releases (provided that it is accepted by the ethereumjs/merkle-patricia-tree maintainers).
The newly observed issue:
Error: Error: LevelUpArrayAdapter named 'blockLogs' index out of range: index 190; length: 187
at ganache-cli/build/lib.node.js:87740:23
at ganache-cli/build/lib.node.js:87714:5
at ganache-cli/build/lib.node.js:88160:12
at ganache-cli/build/lib.node.js:87925:19
at ganache-cli/build/lib.node.js:92548:7
at _combinedTickCallback (internal/process/next_tick.js:67:7)
at process._tickCallback (internal/process/next_tick.js:98:9)
at Object.InvalidResponse (node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:41484:16)
at node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:329530:36
at node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:176186:11
at node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:325200:9
at XMLHttpRequest.request.onreadystatechange (node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:328229:7)
at XMLHttpRequestEventTarget.dispatchEvent (node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:176415:18)
at XMLHttpRequest._setReadyState (node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:176705:12)
at XMLHttpRequest._onHttpResponseEnd (node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:176860:12)
at IncomingMessage.<anonymous> (node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:176820:24)
at IncomingMessage.emit (events.js:164:20)
Alright thanks for the information. I will start working trying to figure this out as well. Do you have any more information on how you caused this error?
@kammerdiener it's fairly easy to do this with trufflesuite/ganache, as it'll crash somewhat readily when running truffle test
on nearly any truffle project while the accounts page is visible. You can grab the metacoin truffle box if you don't have a project handy.
I recommend using the develop
branch Ganache as well, as that will let you see your logs on crash a bit more easily by hitting the "report issue to github" button (just do me a favor and don't submit the issue that it opens up).
Also I'd expected ganache-cli to have good sourcemap support by now. I'm looking into this and will commit a fix to the relevant places once that's done. Will comment back here as well, hopefully with a mapped stack trace rather than the useless one I just posted.
@benjamincburns I will be watching for it. Thanks for the tip. I will see what I can do to fix it and make it more stable!
Also be advised that the error I reported will be visible as an error response in your tests. Ganache doesn't crash in this case, but it's also useless for future tests (rerunning tests w/o restarting Ganache just spits out the same error)
@kammerdiener I managed to fix sourcemaps. I'll commit that to the develop
branch of ganache-cli
within the next few minutes.
In the mean time, here's the mapped stacktrace:
at ganache-cli/build/webpack:/ganache-core/lib/database/leveluparrayadapter.js:51:1
at ganache-cli/build/webpack:/ganache-core/lib/database/leveluparrayadapter.js:25:1
at ganache-cli/build/webpack:/ganache-core/node_modules/level-sublevel/shell.js:102:1
at ganache-cli/build/webpack:/ganache-core/node_modules/level-sublevel/nut.js:122:1
at ganache-cli/build/webpack:/ganache-core/node_modules/cachedown/index.js:53:1
at _combinedTickCallback (internal/process/next_tick.js:67:7)
at process._tickCallback (internal/process/next_tick.js:98:9)
at Object.InvalidResponse (node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:41484:16)
at node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:329530:36
at node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:176186:11
at node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:325200:9
at XMLHttpRequest.request.onreadystatechange (node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:328229:7)
at XMLHttpRequestEventTarget.dispatchEvent (node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:176415:18)
at XMLHttpRequest._setReadyState (node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:176705:12)
at XMLHttpRequest._onHttpResponseEnd (node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:176860:12)
at IncomingMessage.<anonymous> (node/v9.3.0/lib/node_modules/truffle/build/cli.bundled.js:176820:24)
at IncomingMessage.emit (events.js:164:20)```
Actually now that I read that a bit more carefully, it seems that it's actually broken the truffle side of things. I'll dig in a bit further and see if I can't figure out what's gone amiss.
@kammerdiener I'm afraid I also just observed the original issue with your fix in place:
TypeError: Cannot read property 'pop' of undefined
at CheckpointTrie.Trie._updateNode (ganache-cli/build/webpack:/merkle-patricia-tree/baseTrie.js:359:1)
at ganache-cli/build/webpack:/merkle-patricia-tree/baseTrie.js:106:1
at ganache-cli/build/webpack:/merkle-patricia-tree/baseTrie.js:460:1
at processNode (ganache-cli/build/webpack:/merkle-patricia-tree/baseTrie.js:465:1)
at ganache-cli/build/webpack:/merkle-patricia-tree/baseTrie.js:503:1
at ganache-cli/build/webpack:/merkle-patricia-tree/baseTrie.js:179:1
at ganache-cli/build/webpack:/merkle-patricia-tree/util.js:75:1
at ganache-cli/build/webpack:/merkle-patricia-tree/node_modules/async/lib/async.js:52:1
at ganache-cli/build/webpack:/merkle-patricia-tree/node_modules/async/lib/async.js:269:1
at ganache-cli/build/webpack:/merkle-patricia-tree/node_modules/async/lib/async.js:44:1
@benjamincburns gladly!
@kammerdiener did you still want to claim the work here, or should i release it to someone else?
@owocki I'll work on it some more. I think I have an idea for the fix.
@owocki I have exhausted all of my potential fixes. If you would go ahead and release it in the interest of getting this fixed sooner. I will look for other issues to try and tackle.
@owocki are claims mutually exclusive, or can multiple people register a claim to the bounty at once? If only one claim can be registered at a time, I think it's best if we reject @kammerdiener's claim until he's resubmitted. I really appreciate your work, @kammerdiener, but I worry that people may see a claim for the bounty here and be put off from trying to solve the issue, when the truth is that the field is still wide open.
@benjamincburns totally understand and totally agree. I am out of ideas as to what to check (although I will be watching for the solution just because of my own curiosity).
Hi @benjamincburns -- only one person can claim work for an issue at a time. We do this so that the network is not an enabler of spec work.
I'm rejecting @kammerdiener 's claim now.. Thanks for the work and the solid communication.
This claim for the funding of 0.5 ETH (516.33 USD) attached to this issue has been rejected and can now be claimed by someone else.
The funding of 0.5 ETH (491.36 USD) attached has been claimed .
If you are the claimee, please leave a comment to let the funder (@owocki) and the other parties involved your implementation plan. If you don't leave a comment, the funder may expire your claim at their discretion.
I think the best (only?) way that this can be solved such that the band-aid fix of processing each request individually can be removed is if instances of merkle-patricia-tree
are short-lived and not shared between requests. Rather, each instance should share the same backing db
.
I had a local branch open a while ago that did this, and which also created instances of the EVM as needed, but I never managed to get it working with the forking feature.
The tests expose two different issues both in the https://github.com/ethereumjs/merkle-patricia-tree. There may be additional problems between the ganache doubles
I have both working, but need to cleanup code and will put up a patch
hey @clehene are you the claimee above?
@owocki hey, yes
That's exciting news, @clehene! I'm eager to see what you've come up with!
@clehene Just a tip - if you're going to test using Ganache like I described here, you'll want to be sure to disable the request queuing in ganache-core, found in lib/provider.js
.
@benjamincburns hey first sorry for the delay. Second, I'm afraid I'll need a bit more info on how to do that test or the one mentioned in #417 - i.e. concrete instructions
I tried a few things, but I'm otherwise doing many things for the first time, so likely missing some context.
lib/provider.js
in ganache-core
truffle unbox metacoin
truffle test
in metacoin it works by defaultnpm start
in ganache and then truffle test
in metacoin it fails with a complaint its' missing ConvertLib
I can't tell what would be request queueing (requestfunnel subprovider?)
Request queuing is here
Easiest way to disable it is to pass an options
object with asyncRequestProcessing = false
. You can hard code this in ganache
in src/chain.js
to make life a little easier.
if I do a truffle test in metacoin it works by default
Just a heads up that if you don't pass asyncRequestProcessing = false
then this will likely be the case, as the queuing mechanism I linked to above sidesteps the issue.
if I do npm start in ganache and then truffle test in metacoin it fails with a complaint its' missing ConvertLib
If I remember correctly, ConvertLib
is a contract which comes as part of the metacoin box. I'd recommend reinstalling the box.
Also @clehene if you're thinking it's working well in ganache-cli, I'm happy to help test it in the Ganache UI if you're having difficulties - just need you to point me at the branch(es) which contain(s) your fix(es).
@clehene actually, I can do one better than that. I've just pushed a branch to ganache-core called fix-forking-debug
. In this branch I add test/forked_debug.js
, which is essentially the same test as test/debug.js
except I'm debugging a transaction from a forked chain. The changes I've made in that branch should have been enough to make things work, but unfortunately I run into the missing stack in _updateNode
(aka Cannot read property 'pop' of undefined
) crash.
I have a feeling that if your changes make this test pass, plus 0xNPE's tests, we'll be in really good shape.
later edit: missed the last few comments after my previous
@benjamincburns I spent some time over the weekend: I had a bit of trouble figuring out how to link modules to run the cli with my version of ganache-core.
Guessing the two options you mentioned were _queueRequest
vs `_sendAsnyc.
I had my initial work based on just the unit tests provided, so had to re-introduced some scaffolding in the otherwise cleaned up code.
The two unit tests address 2 distinct problems. Both are fixed.
In short there's at least an additional one related to async ops under semaphores which will reproduce with truffle test
and sendAsync
but only under certain conditions, bot not captured by the unit tests.
In other words, I can probably spend some time later this evening to put up a patch to start the review and some validation, before a full assessment of the remaining stuff.
but unfortunately I run into the missing stack in _updateNode (aka Cannot read property 'pop' of undefined) crash.
@benjamincburns the current fix (which will need a lot of scaffolding cleanup so may take a bit of time this evening) fixes the missing stack and also a good degree of the get issues scenarios.
@clehene : I don't know how to say it, but thanks for the coming patch ! I've never been F5'ing an issue so much :p
This bug has been so annoying ! So thanks again a thousand times ! Also Thanks to the ganache-cli team for the tools you provide :3
I'll take a rare pleasure to read the patch code and try to improve my understanding of this awesome tool :D
P.S : however no rush, this was to thanks, not to rush anything, i'll wait the needed time to enjoy a clean fix :)
I fixed that last get
undefined issue (and added a few additional locks in places where it was clear it could have caused issues). However this last issue was trickier and is partially a break of encapsulation problem: getRaw
should probably be private but is called directly from Account.
To recap, when doing a checkpoint we end up with two dbs and all reads (getRaw->db.get
) happen on both asynchronously. When things are committed or reverted the second db goes away and the in-flight get will try the second instance (think db[0]
db[1]
) that's not there anymore and fail with undefined
.
This means that access needs to be synchronized around anything that would race for db instances.
Many of the operations were already using semaphores for this however some didn't (e.g. checkpoint
, get
). The problem with the last race was due to direct access to getRaw
from Account.prototype.getCode
. Again IMO this is mostly an encapsulation problem as get should probably be used instead (and perhaps that's the right fix, although I'd say it would also require making getRaw "private".
The alternative (which is what I'll include in the patch) for a merkle-patricia-tree only fix is to mark the under-lock state with a flag (param), based on which's absence (if flag is off) the semaphore can be acquired to avoid the race.
The pop issue, like I mentioned, was a logical one and happened due to propagation (or lack thereof) not found errors.
While most of this conversation has happened on ganache-cli
all the problems are actually in merkle-patricia-tree.
Lastly, the merkle-patricia-tree code is more complex than it perhaps should given the functionality (deep callback stack, mutual recursion, heavy async, etc.). In addition the stuff around it, there may be other issues lying around,
As a caveat - now with the get / pop issues out of the way I can sometimes see https://github.com/trufflesuite/truffle/issues/558 https://github.com/trufflesuite/ganache-cli/issues/433
Initially I was concerned that they were somehow introduced by these fixes and tried to see how but that wasn't the case (not sure how they occur though) however I've confirmed them without the fixes as well.
Great work @clehene! I've added a comment to the PR, however bear in mind that I don't maintain that repo, so I can't accept that PR. I'll flag down some of the ethereumjs folks tomorrow and see if we can get someone to review it.
Quick update.
summarizing the PR mention here as I closed it for now:
As I updated multiple methods to use the semaphore, there were two without callbacks and I missed fully fixing them.
As checkpoint
takes a semaphore and changed from sync to async, there are implications to call sites and wanted to revalidate after changing the callsites.
I also realized the second one was unnecessary after inspecting all possible ways to get to it and realizing it should be locked already.
As I'm blowing in yogurt ;) I'm double checking the two assumed unrelated issues to be sure.
I don't think the changes would be to big / hard (I've made them in the libraries involved with truffle) and seem to be ok.
Hi @clehene, how are things going? I can't help but feel like you were sooooo close with the changes you submitted last time. No pressure, but let me know if there's anything I can do to be of assistance!
@benjamincburns hey, I've integrated the changes in VM in order to use the async checkpoint. However I've started seeing the ou of gas errors after (The contract code couldn't be stored, please check your gas amount.
). I don't think my fix is causing it necessarily, but it may make it occur more frequently now (https://github.com/trufflesuite/truffle/issues/558).
Hence I've instrumented all libs to track things and went down the rathole tryin to fix that too. Found one or two more potential issues... Meanwhile had a hand surgery and a bit slower :). I can repost the patch with the latest changes and take it from there.
Meanwhile had a hand surgery and a bit slower :)
"... with one hand tied behind his back!" :joy:
No pressure - definitely take it easy if you're still in recovery.
I appreciate you trying to sort out trufflesuite/truffle#558, but I think if you're reasonably confident that the change you've made is necessary to fix #417, then you've just exposed some other existing flaw in the process, not created a new one. I'm overjoyed if you want to take that on, too - but don't feel that it's necessary to capture this bounty.
@benjamincburns , @owocki the bounty shows up as "open" with "expired funds". It looks that someone is working on this, pleas clarify the status.
As a result of #450, in trufflesuite/ganache-core#41, @0xNPE gave us some much needed tests which reproduce the underlying problems behind #417. This bounty is for a fix for the underlying problem which those tests identify.
Bounty acceptance criteria:
ganache-core
or any other impacted projects.Important note: This bounty will not be paid out until all required PRs are accepted and merged.
Tips:
ganache-core
you can usenpm link
to symlink your local copy of the dependency into thenode_modules
directory of theganache-core
project.standard
tool to verify that you have done so.