Closed dwalintukan closed 5 years ago
@cgewecke: Some more analysis and an exact pinpoint of the problem (a solution if you will):
I have added a counter to test whether the number of opened HTTP requests in the system grows as the number of waiting ports seems to be growing.
I increase this counter after request.send(JSON.stringify(payload))
and decrease it in function request.onreadystatechange
(upon request.readyState === 4
).
The counter is zero when the failure occurs, which means that there are no opened requests at that point (this is in contrast with the number of waiting ports).
I have found a post suggesting that the TIME_WAIT
period is configurable in the OS, but that would be an OS-dependent solution, which I'd really hate.
I have done some reading on the XMLHttpRequest
object, to see if I could somehow use it in order to "signal someone on the system" that the request is done and that the port can be closed.
I haven't found any such option (in fact, I think that there is no really such thing as "HTTP connection close" defined in the HTTP standard).
I did notice, however, that for asynchronous requests, you are using XHR2
instead of XMLHttpRequest
.
I'm not sure about the difference between these two, I only understand that the former is a NodeJS wrapping of the latter (which is a "Javascript native type").
Nevertheless, when I change the code to use XMLHttpRequest
instead of XHR2
, the test runs to completion!!!
Oddly enough, when the test is done, there are still some 16,000 ports in TIME_WAIT
state.
However, this time, in addition to (something like) this:
TCP 127.0.0.1:49155 127.0.0.1:8545 TIME_WAIT 0
TCP 127.0.0.1:49157 127.0.0.1:8545 TIME_WAIT 0
TCP 127.0.0.1:49165 127.0.0.1:8545 TIME_WAIT 0
...
TCP 127.0.0.1:65532 127.0.0.1:8545 TIME_WAIT 0
TCP 127.0.0.1:65533 127.0.0.1:8545 TIME_WAIT 0
TCP 127.0.0.1:65534 127.0.0.1:8545 TIME_WAIT 0
I also see (something like) this:
TCP 127.0.0.1:8545 127.0.0.1:49152 TIME_WAIT 0
TCP 127.0.0.1:8545 127.0.0.1:49153 TIME_WAIT 0
TCP 127.0.0.1:8545 127.0.0.1:49154 TIME_WAIT 0
...
TCP 127.0.0.1:8545 127.0.0.1:65531 TIME_WAIT 0
TCP 127.0.0.1:8545 127.0.0.1:65534 TIME_WAIT 0
TCP 127.0.0.1:8545 127.0.0.1:65535 TIME_WAIT 0
I'm not sure why exactly the test completes, and whether or not we can even consider the replacement of XHR2
with XMLHttpRequest
a solution (though it does seem like a good workaround by the least).
But I think that we should focus our investigation on the difference between these two.
Thanks.
@barakman Great work! So glad you got that suite running.
Was also googling around about this yesterday and saw a thread that suggests another possibility is to pass a special header into the request telling it to close the connection when done, since the default behavior for HTTP is keep-alive
. Example:
var options = {host: 'graph.facebook.com',
port: 80,
path: '/' + fb_id + '/picture',
headers: { 'Connection':'Close' }
};
The relevant web3 code is here.
Truffle invokes that constructor at truffle-provider
here. If the problem can be addressed by adding headers there we'd be able to fix this directly.
If not it's quite a bit more complicated - web3
is a library written and maintained by the Ethereum Foundation. We consume (rather than write) it and it's non-trivial to get the code changed there (for good reason since that code drives much of the Ethereum JS eco-system).
If you're still investigating this and have a chance, could you see if setting the headers that way also resolves this?
@cgewecke: Thank you!
web3
authors / contributors of the XHR2
findings.
I am inclined to think that it might impact other open issues.truffle test
relies on that anyway). The web3
class is globally available in all of my tests (not sure if because of Mocha or because of Truffle). So I'm not quite sure how or where to add this configuration.
Is it possible to add it in the truffle configuration file?
If no, how else can I go about applying it?
Writing it in every test seems kinda overkill.
Nevertheless, I will try it on the specific case at hand and let you know if it solves it.
Am I understanding you correctly, that you just want to see if it resolves the problem, so that you can fix Truffle accordingly?
I'm not entirely sure how to add it neither in my test nor in Truffle's cli.bundled.js.
Would it be sufficient to change this:
provider = new Web3.providers.HttpProvider("http://" + options.host + ":" + options.port);
To this:
provider = new Web3.providers.HttpProvider("http://" + options.host + ":" + options.port, 0, '', '', [{name: 'Connection', value: 'Close'}]);
In file cli.bundled.js?Update:
For the code fix above, I get a message from Truffle (or from the Ethereum client):
Refused to set unsafe header "Connection"
I Googled it, and found this StackOverflow answer and this Web3 GitHub thread.
Do you have another suggestion?
Thanks.
You can workaround the Refused to set unsafe header "Connection"
error as follows:
XMLHttpRequest.prototype._restrictedHeaders
object.connection
key or change its value from true
to false
.However, the bottom line result remains unchanged (i.e., the initial problem persists).
@barakman Ah no, sorry I don't - I guess that's a dead end. Hmmmm.
@cgewecke:
So the only option currently at hand is to connect in package.json
a script which will modify Truffle source code, and call that script after npm install
and before npm test
?
@cgewecke:
BTW (and yet again), there seem to be several different versions of HttpProvider.prototype.prepareRequest
"bundled together" in the same Truffle package.
One of them actually uses an XMLHttpRequest
object for asynchronous requests, which is how we'd like it to be.
The way I see it, there are two options here:
XHR2
some time ago.XHR2
some time ago.The first case might make it easier to push forward towards reverting this change, which seems harmful. The second case is even better - simply move Truffle to use the newer version of Web3.
See below the various occurrences of HttpProvider.prototype.prepareRequest
in the code.
Occurrence 1:
HttpProvider.prototype.prepareRequest = function (async) {
var request;
if (async) {
request = new XHR2();
request.timeout = this.timeout;
}else {
request = new XMLHttpRequest();
}
request.open('POST', this.host, async);
request.setRequestHeader('Content-Type','application/json');
return request;
};
Occurrence 2:
HttpProvider.prototype.prepareRequest = function (async) {
var request = new XMLHttpRequest();
request.open('POST', this.host, async);
request.setRequestHeader('Content-Type','application/json');
return request;
};
Occurrence 3:
HttpProvider.prototype.prepareRequest = function (async) {
var request;
if (async) {
request = new XHR2();
request.timeout = this.timeout;
}else {
request = new XMLHttpRequest();
}
request.open('POST', this.host, async);
request.setRequestHeader('Content-Type','application/json');
return request;
};
Occurrence 4:
HttpProvider.prototype.prepareRequest = function (async) {
var request;
if (async) {
request = new XHR2();
request.timeout = this.timeout;
} else {
request = new XMLHttpRequest();
}
request.open('POST', this.host, async);
if (this.user && this.password) {
var auth = 'Basic ' + new Buffer(this.user + ':' + this.password).toString('base64');
request.setRequestHeader('Authorization', auth);
} request.setRequestHeader('Content-Type', 'application/json');
if(this.headers) {
this.headers.forEach(function(header) {
request.setRequestHeader(header.name, header.value);
});
}
return request;
};
Thanks
@barakman Which version of truffle are you using? I will track that down and if this can be fixed by normalizing web3 versions will do that ASAP.
@cgewecke:
At present, I am using Truffle v4.1.3, with my Solidity contracts under v0.4.18.
I am planning to move to Truffle 4.1.5 as soon as I have an idle slot, but that will force me to upgrade my Solidity contracts to v0.4.23, and due to the syntactical changes (namely emit
, constructor
and the deprecation of var
), that idle slot will have to be a little wider than what it would take to just change Truffle version in package.json
.
In short, I will be happy if this change (if indeed applicable) becomes available on Truffle v4.1.3, but Truffle v4.1.5 will also do just fine.
Thanks again for all your help!
@cgewecke:
Of course, it still needs to be asserted that this fix is not just some coincidental result due to the "timely-nature" of the problem (i.e., we must be able to explain it based on the functional difference between XHR2
and XMLHttpRequest
).
@cgewecke:
A satisfactory proof:
In the HttpProvider.prototype.sendAsync
function, I added console.log(request.getAllResponseHeaders())
upon response (in the onreadystatechange
callback function).
When the HttpProvider.prototype.prepareRequest
function uses XHR2
, the printout form is:
content-type: application/json
vary: Origin
date: ...
content-length: ...
When the HttpProvider.prototype.prepareRequest
function uses XMLHttpRequest
, the printout form is:
content-type: application/json
vary: Origin
date: ...
content-length: ...
connection: close
@barakman
- Web3 has introduced the use of XHR2 some time ago.
- Web3 has revoked the use of XHR2 some time ago.
Unfortunately it looks like case 1 is true. XHR2 is used in the latest web3 0.x as wells as web3 1.0. Have also tried running your reproduction case using web3 1.0 over websockets without luck. . .
This issue raises questions about whether web3 / truffle / ganache are really suited to running simulations with tens of thousands of calls. There might be significant value in building a tool that ran tests directly on top of ethereumjs-vm, or perhaps inside ganache, avoiding http overhead and other constraints.
@cgewecke:
I did a little reading, and it seems that connections are closed by default in HTTP 1.0 and kept alive by default in HTTP 1.1. And I'm guessing that XMLHttpRequest
supports HTTP 1.0 while XHR2
supports HTTP 1.1, so it makes sense that Web3 has switched from XMLHttpRequest
to XHR2
and not vice-versa.
As with regards to the second part of your comment, please note that I have experienced the same problem when using solidity-coverage
along with testrpc-sc
. And as far as I understand, those two are designated specifically for the purpose of "running simulations with tens of thousands of calls" (how else would you achieve a complete coverage of your contracts?).
For now, I have added the following workaround on my system:
package.json
, added file fix-truffle.js
:
FILE_NAME = "./node_modules/truffle/build/cli.bundled.js";
let fs = require("fs");
let oldData = fs.readFileSync(FILE_NAME, {encoding: "utf8"});
let newData = oldData.replace(/new XHR2/g, "new XMLHttpRequest");
fs.writeFileSync(FILE_NAME, newData, {encoding: "utf8"});
package.json
, added:
"scripts": {
"install": "node fix-truffle.js"
}
Thanks.
@cgewecke - just to finalize this issue (also for future readers):
The fix suggested above indeed seems to resolve the Could not connect to your Ethereum client
problem discussed in this thread.
However, it exposes yet another problem:
Invalid JSON RPC response: "Error: socket hang up
at createHangUpError (_http_client.js:331:15)
at Socket.socketOnEnd (_http_client.js:423:23)
at emitNone (events.js:111:20)
at Socket.emit (events.js:208:7)
at endReadableNT (_stream_readable.js:1056:12)
at _combinedTickCallback (internal/process/next_tick.js:138:11)
at process._tickCallback (internal/process/next_tick.js:180:9)"
at ProviderError.ExtendableError (C:\Users\...\webpack:\~\truffle-error\index.js:10:1)
at new ProviderError (C:\Users\...\webpack:\~\truffle-provider\error.js:17:1)
at C:\Users\...\webpack:\~\truffle-provider\wrapper.js:71:1
at C:\Users\...\webpack:\~\truffle-provider\wrapper.js:129:1
at exports.XMLHttpRequest.request.onreadystatechange (C:\Users\...\webpack:\~\web3\lib\web3\httpprovider.js:128:1)
at exports.XMLHttpRequest.dispatchEvent (C:\Users\...\webpack:\~\xmlhttprequest\lib\XMLHttpRequest.js:591:1)
at setState (C:\Users\...\webpack:\~\xmlhttprequest\lib\XMLHttpRequest.js:610:1)
at exports.XMLHttpRequest.handleError (C:\Users\...\webpack:\~\xmlhttprequest\lib\XMLHttpRequest.js:532:1)
at ClientRequest.errorHandler (C:\Users\...\webpack:\~\xmlhttprequest\lib\XMLHttpRequest.js:459:1)
at Socket.socketOnEnd (_http_client.js:423:9)
at endReadableNT (_stream_readable.js:1056:12)
at _combinedTickCallback (internal/process/next_tick.js:138:11)
at process._tickCallback (internal/process/next_tick.js:180:9)
This problem seems to be of the following nature:
I believe that a possible fix for this problem is in the XMLHttpRequest
function, around the area of:
request = doRequest(options, responseHandler).on("error", errorHandler);
Perhaps there's a missing handler for this request, for its socket, for its response or for its response's socket.
In either case, I have not been able to resolve it. Most of my attempts were focused on searching NodeJS HTTP API for functions and/or events which might be used here.
The fact that a "massive" test completes successfully, but only when it takes place, does the next test emit this error (immediately when it begins) should give some hints, but I'm not sure what. It seems that the "massive" test does not release the socket when it is held for a long period (cutting this test shorter resolves the problem).
A simple workaround for this problem is to execute truffle test
separately for each test file.
In other words, closing and reopening Truffle solves the problem, which implies that some resource (a socket?) is not released until Truffle is closed.
Unfortunately, this workaround is insufficient for solidity-coverage
users (myself being among them), since this utility cannot be executed separately for each test file.
If someone can find a way to apply this ("close and reopen after every test file") in Truffle source code itself, then it might be a good solution.
I tried that too - in the Test.run
function, at line js_tests.forEach(function(file)...
- but couldn't quite get it to work.
@cgewecke:
I have managed to fix (or if you will, find a workaround for) the socket hang up
issue described above, which has emerged after I had resolved the original issue (by replacing XHR2
with XMLHttpRequest
).
As mentioned before, this socket hang up
error seems to be pretty consistent in the fact that it happens only at the end of a massive test (or perhaps at the beginning of the test that follows).
A deeper investigation has shown that it always happens as a result of a request consisting of payload.method === 'evm_revert'
, to which the response is an error message (and obviously an invalid JSON).
A glimpse at Ganache source code reveals that evm_revert
is indeed executed at the end of each test (using afterEach
).
Though I don't have any real evidence to support this, I think that it is possibly because an evm_revert
executed after a massive test takes a very long time to complete, during which the connection is timed out.
By the way, the status of this response is 0. I previously bumped into some GitHub thread referring to why you've decided not to ignore status 0 in Truffle (the reason being that a test might fail silently, if I remember correctly). I can't find this thread now, but you were in it, so you might find the remaining of this comment relevant.
In any case, in order to workaround the socket hang up
error, I simply fixed Truffle source code to ignore an error in the response if the request's payload.method
is evm_revert
.
Since evm_revert
is not really a part of any test which I could possibly run on Truffle, I am confident that this fix cannot do any harm, for example (yet again), allow a test to fail silently.
Here is the extended workaround (for both problems), for any future readers:
package.json
, add file fix-truffle.js
:
let FILE_NAME = "./node_modules/truffle/build/cli.bundled.js";
let TOKENS = [ {prev: "request = new XHR2", next: "request = new XMLHttpRequest"}, {prev: "error = errors.InvalidResponse", next: "error = payload.method === 'evm_revert' ? null : errors.InvalidResponse"} ];
let fs = require("fs");
let data = fs.readFileSync(FILE_NAME, {encoding: "utf8"});
for (let token of TOKENS) {
data = data.replace(new RegExp(token.prev, "g"), token.next);
console.log(replaced "${token.prev}" with "${token.next}"
);
}
fs.writeFileSync(FILE_NAME, data, {encoding: "utf8"});
2. In file `package.json`, add:
"scripts": { "install": "node fix-truffle.js" }
Thanks
**UPDATE:**
It seems that even if a `socket hang up` error which occurs as a result of an `evm_revert` request at the end of a test is resolved (by ignoring it), a similar error may then occur as a result of an `evm_snapshot` request at the end of the next test.
We can slightly extend the workaround above to handle both cases, by changing this:
payload.method === 'evm_revert'
To this:
payload.method.startsWith('evm')
As `evm` requests are not something likely to be invoked directly from a testing script, I think that this extension is quite safe (i.e., will not cast away "real" errors in a given test).
However, generally speaking, I get the feeling that while Ganache takes a very long time to complete these requests in some cases (more specifically, after a massive test is conducted), the connection is simply (and abruptly) terminated.
The fact that restarting `truffle test` resolves this issue, implies that even if it is "Ganache's fault" (for taking so long to complete), it is "Truffle's fault" in handling it.
I am not very "happy" with the workaround proposed above, and I believe that a better approach would be to:
1. Investigate why Ganache takes so long to complete `evm_revert` and `evm_snapshot`.
2. Investigate why Truffle "has a problem" with the fact that Ganache takes so long to do it.
**UPDATE 2:**
For safety, extend this:
payload.method.startsWith('evm')
To this:
typeof payload.method === 'string' && payload.method.startsWith('evm')
Or even to this:
payload.method === 'evm_revert' || payload.method === 'evm_snapshot'
@barakman Thanks so much. The workaround you've proposed seems reasonable to me. There might be some kind of connection timeout at the HTTP layer - I've also seen this disconnection when running long solidity loops that validate bytecode in a call
.
@barakman Out of curiosity, would making revert
and snapshot
optional help with your use case?
@cgewecke:
Thank you.
I assume that the purpose of these two functions is to reset the EVM emulation back to an initial state, so that each one of the tests executed by Truffle will start under the exact same conditions, regardless of the order in which the tests are executed (and of course, the exact same conditions will continue to apply every time you invoke truffle test
).
All of this is designated to ensure deterministic execution, I assume, so making these functions optional is probably in contrast with correct testing methodologies.
That said, since it's optional, I guess that there's no harm done (i.e., Truffle users can choose that on their own risk).
That said #2, I've already added an npm-post-install script to fix Truffle source code, so I'm not in any dire need for this feature (though, I suppose I'll have to do some maintenance work on that script every time I update Truffle version, so perhaps it WILL help me in the future).
It would help for sure if you could check with Ganache developers what might cause the execution of evm_revert
and evm_snapshot
be so lengthy.
Thank you for your help.
@barakman
It would help for sure if you could check with Ganache developers what might cause the execution of evm_revert and evm_snapshot be so lengthy.
I will. In your current suite, approximately how many blocks are being snapshotted / reverted?
@cgewecke:
I have a total of 27 tests, so each one of these functions is invoked 27 times if that's what you mean.
Otherwise, can you please elaborate on what you mean by "how many blocks"?
Should I use web3
in order to get the block-number at the beginning and end of my longest test, and calculate the difference?
Apologies @barakman - yes you could do that or estimate the number of transactions that occur in the suite, since ganache executes a single tx per block.
I'd just like to give the ganache engineers a some guidance about what magnitude of tests triggers this.
@cgewecke: Just by looking at the code, I estimate that:
evm_revert
request fails, executes approximately 16943 RPCs.evm_revert
request fails and the evm_snapshot
request of following test also fails, executes approximately 28954 RPCs.I could give you more accurate figures by getting the block number before and after, but that would take me a while (each one of them runs for about 15-20 minutes or so).
Thanks
That's perfect, thanks @barakman.
Would there be an universal fix available any time soon?
I have random Error: CONNECTION ERROR: Couldn't connect to node http://127.0.0.1:7545/
errors when I do truffle test
too (have 39 tests, 3 of which fail by that reason).
In May everything was still okay, today - it's not :(
@vicnaum Could you provide more detail about your suite or a link to project? At the moment we think this error is limited to very large suites. The principal reporter above has a battery of 50,000 tests.
Do the same 3 tests fail each time?
@cgewecke it's always different tests. Can be only one test failing, but can be at most five. Usually near three. I'm using Windows 10.
The sources are here: https://github.com/vicnaum/hourlyPay
@cgewecke : The error specified by vicnaum (connection error) does not seem to have any relation whatsoever with the issue described in this thread, which appears to be the result of limited resources (more precisely, the system runs out of HTTP connections).
@vicnaum I think @barakman is correct - I looked through the hourlyPay
code a bit and see you're using a lot of methods to move time around on the chain. Would you like to open a separate issue so we can investigate further?
ganache-cli
shouldn't disconnect from truffle under any circumstances so this is likely a bug. Could you display the entire contents of your error and stack trace as well?
@cgewecke & @barakman & others having this issue: I haven't dug into this too deeply, but my guess is that either Truffle or the tests in question are creating new instances of provider
very frequently.
Optimal resource management would be to take advantage of HTTP keep alive by reusing provider instances between tests rather than recreating them.
I can say from experience that sending Connection: close
in the request or explicitly closing the client socket only kicks the can down the road for this problem as you'll still exhaust the local address space due to ports sitting in FIN_WAIT
.
@benjamincburns Yes, it turns out this originates at web3 and they're fixing it in beta.36
.
(It was keep-alive - the change).
Closing this since it seems to have been addressed as a duplicate of the issue above. Let us know if it's still a problem. Thank you!
@gnidan:
AFAIK, this is still a problem on Truffle 4.1.15 (which still uses XHR2
instead of XMLHttpRequest
).
In Truffle 5.x this is possibly fix, since this part of the code has changed, though I haven't verified that, as it requires a bit of work on both my contracts and my tests.
To my understanding, you have released 4.1.15 specifically for this reason (i.e., for those who aren't rushing to upgrade their Solc and Web3 major versions).
So you might want to keep this issue opened until fixed in the Truffle 4 branch (or at least leave a note somewhere to mention that this problem is as viable as ever).
Thanks
hey fyi I'm having this issue w/ Truffle v5.3.3 (core: 5.3.3) web3 in my project is ^1.2.6, only popped up recently, 431 tests with a good density of numbers of calls per-test.
seems like it might be connected to assert manager function recently added.. this has been the first time that tests have been expected to perform any logic past the return to this (await ofc) function in the middle of the test functions.
async function assertReverts(promise, errorMessage = "") {
try {
await promise;
} catch (error) {
assert(error.toString().indexOf("VM Exception while processing transaction: revert"), "Expected VM revert error");
assert(error.toString().indexOf(errorMessage) != -1, `Expected error: "${errorMessage}", actual error: "${error}"`);
return;
}
assert.fail('Expected VM revert :: ' + errorMessage);
}
anyone aware of anything recently changed that could cause this to re-appear?
Issue
On a Ubuntu Linux environment (Trusty), tests randomly fail with this
ExtendableError
:Specifically, I have a Travis-CI (continuous integration) setup and this is where the tests are failing. My local Mac OSX environment passes these tests with no problem. Every once and a while, they will fail with the same error, but I just run the tests again and they pass.
I'd say it happens like 10-15% of the time on Mac OSX, but it happens like 60-80% of the time on the Travis-CI linux env.
It feels like it used to have this error less on earlier Truffle versions. I just updated to 4.0.4 and it seems way more often now.
Steps to Reproduce
Expected Behavior
Tests should pass like they do on Mac OSX env.
Actual Results
I test this on my local machine (mac osx), when all tests pass which they do, I push up to Github. Then it fires off a Travis-CI test on the linux env and fails pretty much every time.
Environment
Travis-CI Env (fails)
Mac OSX Env (passes)
$ gcc --version: