openethereum / parity-ethereum

The fast, light, and robust client for Ethereum-like networks.
Other
6.83k stars 1.69k forks source link

Warp sync keeps failing #11071

Closed 3esmit closed 4 years ago

3esmit commented 5 years ago

Snapshot seems to work fine, until it stops progressing the download, restarting it attempts to download a different snapshot, and starts from beginning again. This keeps happening - I am trying to sync for days, stuck with this problem. I didn't tried without warp sync, after making this report I will try again with warp sync in a new fresh db, and if I run into the same problem I will try syncing without warp sync. I also just tried beta version, as was the recommended by parity.io. My computer or network isn't the problem as I am able to sync using geth fast sync.

image

This certainly isn't going any further: image

Here is the exact moment it happened, it just stopped progressing: image

dvdplm commented 5 years ago

Can you run the sync with logging enabled? parity -l sync=trace

3esmit commented 5 years ago

Ok, doing it now. I just tried again with removed the db and happened again.. image 1 hour later... image

Now I cleared the db again and Im running with -l sync=trace... Lets now wait more one hour or so to see if it works or hangs at some point and we can spot some anomaly.

3esmit commented 5 years ago

I'm being spammed with this and can't see whats happening on the node!

2019-09-19 19:00:13  IO Worker #0 TRACE sync  8 Ignoring transactions while syncing
2019-09-19 19:00:13  IO Worker #0 DEBUG sync  8 -> Dispatching packet: 2

image

Can I stop it making this warning over and over and over ?

3esmit commented 5 years ago

https://www.sendtransfer.com/download.php?id=e53dc54a90a5cde2183907a2175d8561&email=378610 Here is the full log. This file will be available for 7 days. I'll be notified of all downloads through SendTransfer tooling. (let's hope this works). In the logs, you can see all my past attempts, even when I was using WiFi (thats why the LAN IP changed to 192.168.1.4 when I used the cable to remove) The WiFi shouldnt be the problem as is all fast speed wifi and not far router, but started using cable to remove this variant.

The computer running is powerful enough, is a desktop machine with Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz and only SSD storage.

Files Remain Available Until: 27. 9. 2019.

3esmit commented 5 years ago

Related conversation from Gitter :

@3esmit said:

I was only using testnets and geth recently, but now I wanted mainnet for some tests and deploys, and I always had good experience with Parity in past (2016/2017), so I downloaded parity beta, as website suggested as preferred option. First of all I got really mad with the state snapshot size and started cursing MiniMeTokens for bloating 120MB of junk on state, anyway, it was going good, 1 hour and half downloaded, great! But then it stop progressing forever. I tried multiple times, with and without killing the db. I opened an issue at paritytech/parity-ethereum#11071 and posted the debug trace logs there (but they dont seem to say anything different when warp sync stops working) Parity always kept CPU very busy and used max speed and overflow the router with connections, Parity was running on a powerful computer and plenty internet bandwith (cable 50-25MBit/s). In past (2017), party was indeed the best ethereum client - as your website adverts, but thats not the reality. I understand the team is busy with innovation for eth 2.0 and interoperability (which is awesome), however I would like to head-up the team to assure that the main function of your flagship software works properly. I know warp sync can be deactivated and parity would work, but warp sync needs to be fixed to work reliably as it runs by default - or don't set it as default, otherwise you will be burning your reputation with newcomers. This experience would also make me rethink the use of parity for development, as for comparison, geth uses less cpu and works reliably at first try, without overwhelming the network (used 1MB/s and 25 connections), therefore a much more smooth experience, and the development tools and other decentralized systems uses geth (e.g. SWARM). Parity might still work better for some other things such as block explores and miners, and is a very valuable asset for Ethereum network none the less.

@joshua-mir said:

@3esmit Unfortunately there's a catch-22 here - we really do want our Ethereum client to be as good as it can be, but there's simply less people in the community willing to (and capable of) working on it (rust is much, much, less popular than go, and if you haven't noticed, people in the Ethereum space have had declining opinions toward Parity, the company, for a while now) - the less people who work on it, the less people are willing to use it (and vice versa). We can't fix this issue just by putting more internal resources on parity-ethereum because there's little/no incentive to do so, other than maintaining goodwill, honestly. It is not Parity's flagship project, that's Polkadot, and unfortunately there are no resources for us to make drastic changes like "fixing/replacing warp" (which is unreliable only because parity is unpopular and since it's unincentivized, everyone runs with selfish options like --no-periodic-snapshots which makes it even more unreliable). 🤷‍♂️ I honestly can't address your concerns in any other way.

@3esmit said:

@joshua-mir I see. Perhaps disabling the warp sync by default would be a way to go if the snapshots are not available.

@joshua-mir said:

No-warp sync takes close to a month currently. Warp sync takes maybe a few hours of wrestling the client and trying different --warp-barrier settings.

@3esmit said:

Ok, then if there is something that can be done to make it warp sync, this should be done automatically Also, if no warp sync, fast sync would be nice to have. In geth I can get fully synced with fast sync in 1 day and 8 hours (default run settings)

@joshua-mir said:

Yes, I totally agree with you that we should work to unify things like warp and light sync between geth and parity, because currently those two protocols are completely separate between the two clients warp/fast*

@3esmit said:

Regarding the problems Parity had with Foundation, just "ethereum purists" would care about it So the issue I reported in #11071 is already known? I will be looking forward having parity to sync in first try ;)

@joshua-mir said:

Yes. Generally I respond with "try setting warp-barrier manually" but as you point out we should probably work towards making it work by default But that will probably not be high on the queue of things that need to be done

tayvano commented 5 years ago

we really do want our Ethereum client to be as good as it can be, but there's simply less people in the community willing to (and capable of) working on it (rust is much, much, less popular than go, and if you haven't noticed, people in the Ethereum space have had declining opinions toward Parity, the company, for a while now) - the less people who work on it, the less people are willing to use it (and vice versa). We can't fix this issue just by putting more internal resources on parity-ethereum because there's little/no incentive to do so, other than maintaining goodwill, honestly. It is not Parity's flagship project, that's Polkadot,

Responding here bc fuck gitter.

Have you considered your collective attitude and comments like these may be turning more people off than Rust?

Or that your failure to properly prioritize internally and communicate those priorities properly externally is perhaps the cause of the declining opinion of Parity?

Or that your opinion that "goodwill" is not valuable while simultaneously using "people have a declining opinion of Parity" as an excuse for not maintaining your flagship product is the real catch-22?

But sure...continue to pretend there isn't a rotting rat stinking up the place. At least your dot allotment won't be adjusted if your compliant. That's the real "incentive structure" dominating here, right? Not, I don't know, maintaining your integrity and reputation by improving your product and rewarding, not degrading, those who spend their time reporting bugs and sharing extensive logs so that the thing that made Parity valuable in the first place can continue to be valuable. 🙄

tkstanczak commented 5 years ago

Just leaving it here: https://github.com/NethermindEth/nethermind

andresilva commented 5 years ago

I hope everyone in this issue realizes that the feature being discussed here is not part of the standard Ethereum protocol and is not supported by any of the other Ethereum clients.

I don't really understand what the issue is with deciding not to spend more time on a feature that is not core to the product and has not been working properly for some time now (it doesn't really scale as state size grows). In fact, there were even ideas for Eth1x to come up with a new cross-client protocol designed to overcome some of these issues.

Before joining the bandwagon making blanket and uninformed statements I'd invite you all to take a look at the actual development that's been happening on this project (stats for the past month https://github.com/paritytech/parity-ethereum/pulse/monthly mostly related to the upcoming Istanbul HF) as well as other ETH-related projects from parity. There are developers working on this everyday and your comments and suspicions are disrespectful towards their work and commitment.

I would also like to remind that this issue tracker is meant for technical discussions and not as a general soapbox, there are plenty of other avenues for that.

dvdplm commented 5 years ago

I would also like to remind that this issue tracker is meant for technical discussions and not as a general soapbox, there are plenty of other avenues for that.

This.

From here on out, this thread is going back to being about the original bug report. Anyone with thoughts to share about anything else are kindly requested to find a different forum for it.

dtran320 commented 5 years ago

@3esmit In the interest of getting back to the original bug report, just wanted to say that we've run into this issue with warp sync getting stuck and believe it has to do with the source node that you're downloading the snapshot from no longer being available as a snapshot peer and there not being other Parity nodes on the network that have the snapshot, so this isn't necessarily an easy bug to track down/fix. What may make matters worse is that we've found that disabling snapshotting on our own nodes improve performance, so it's possible many other nodes on the network are also disabling snapshots, which makes it tough to reliably stay peered to nodes with the snapshot you want.

The workaround we've come up with when we need to warp sync is a simple script to just tail the parity log, and if we don't see it making progress for many snapshot log statements in a row, the script automatically restarts parity so it finds a new snapshot peer. @dvdplm would know best, but I don't believe it's necessary to kill the database before restarting parity. We've successfully brought up tens of warp-sync'ed nodes without any manual intervention using this script + smart warp barriers.

3esmit commented 5 years ago

Suggestions:

roninkaizen commented 5 years ago

encountered allmost the same, with a slight difference on AMD-systems, had to do with mem-limits i sat up recently after testing around (we had this behaviour before and we will always sometimes have it as we use parity, i think), I need to look it up how i fixed it, but personally came to a productive solution; nowadays it takes now much longer to get "updated" until the latest block, and yes, "i am still around" with parity even on main-net, ethernodes.org is proving that, using the searchbox and "roninkaizen"- something with mem, gitter me, if interested, regards

ronin

3esmit commented 5 years ago

An indication of progress for large chunks in warp sync would be very helpful.

More information on the warp sync to help user guide if they should restart or not would be a great start!

dvdplm commented 5 years ago

An indication of progress for large chunks in warp sync would be very helpful.

I definitely agree this would be a great first step. When I've run into this problem myself I've been frustrated by not knowing exactly what is going on. I think we should both improve logging but also provide a way to toss away a partially downloaded/imported snapshot so that stuff like what @dtran320 describes is no longer necessary (or much easier to perform).

Another area of investigation would be to make the snapshotting less "invasive" to the normal operation of a node, and spread out the CPU/memory usage over a longer period of time. Turning off snapshotting is detrimental to the whole network but beneficial to the single synced node and that's a tension we need to lessen.

3esmit commented 4 years ago

Was this fixed?