gridcoin-community / Gridcoin-Research

Gridcoin-Research
MIT License
588 stars 173 forks source link

Bad CPID errors in new .8 client affected syncing -- Debug info included. #262

Closed iFoggz closed 7 years ago

iFoggz commented 7 years ago

Run into a sync issue and after reviewing the debug.log it appears the BAD CPID issue is still around. I had to restart my wallet for it to fix and sync up correctly however the problem started over again.

Issue started at block 861100 with CPID 0b5ef259411ec18e8dac2be0b732fd23. According to https://gridcoinstats.eu/cpid.php?a=view&id=0b5ef259411ec18e8dac2be0b732fd23 the CPID was renewed recently.

https://pastebin.com/cS7msHP4

is one entry thou it keeps going till i restarted. once i restarted the problem was gone. after awhile of staking again i had bad block problems again and hashes for blocks didnt match up. but after a lil while all the hashes lined up and my client continued without an issue thou a second bad cpid came up. the previous bad hashes had changed to correct hashes.

https://pastebin.com/AuKTdN3H

CPID: 1495c81924713c9803efd73b2ecd47f9 was rewnewed hours ago according to https://gridcoinstats.eu/cpid.php?a=view&id=1495c81924713c9803efd73b2ecd47f9 and seems like a new/renew beacon issue?

Other information that could be helpful possibly:

I have noticed a dozen of these errors over 2 hours

2017-04-03 21:37:25 ERROR: AcceptToMemoryPool : Unable to Connect Inputs 53fcb8def168db2f4ed4963947623166c9fce1c1a01ecfb3861dc6022ff269e6

Also earlier i had a 2017-04-03 17:57:48 Received block a86337aef5479db412093e262cc12a5bf8381a6244a132ed988fea6674809b49; -ERROR: ConnectBlock[] : Superblock hash does not match consensus hash; SuperblockHash: dd7ed15fdf1e7cca0f3b9f7f9c868702, Consensus Hash: ec6cf0f1e24a490319edfdb96a1bed86

So far everything seems to be moving along fine now, but still getting the bad cpid errors in wallet.

thought i'd post what information i scavanged.

Superblock issue? New beacon issue? Seems odd only new cpids that are renewed in the errors -- maybe old issues fixed and new one popped out?

grctest commented 7 years ago

2017-04-03 21:37:25 ERROR: AcceptToMemoryPool : Unable to Connect Inputs 53fcb8def168db2f4ed4963947623166c9fce1c1a01ecfb3861dc6022ff269e6

If you do 'getrawmempool' and there aren't stuck transactions then that's not a problem.

2017-04-03 17:57:48 Received block a86337aef5479db412093e262cc12a5bf8381a6244a132ed988fea6674809b49; -ERROR: ConnectBlock[] : Superblock hash does not match consensus hash; SuperblockHash: dd7ed15fdf1e7cca0f3b9f7f9c868702, Consensus Hash: ec6cf0f1e24a490319edfdb96a1bed86

Sounds like someone tried to stake a superblock with a minority neuralhash weight and was rejected, no problem there.

xXUnRealXx commented 7 years ago

I've had same problem yesterday after successful re-sync, thank to *8.8 Easy fix for me atleast: re send becon. execute adversticebeacon
It fixed it for me.

that's my 5c worth.

xXUnRealXx commented 7 years ago

execute advertisebeacon

*yes, dumbazz cant spell.

iFoggz commented 7 years ago

done a little more poking around and it shows the hash in the bad cpid and that been matching up to the correct hash however my client keeps saying bad cpid and takes a restart to get it to go right. also noticed this only happens and stalls when starsdust gets a DPoR payment and hes had 9 in past 8ish hours and each time he gets one i run into a problem. some stalls i get past and all the blocks are accepted correctly after awhile. so for most part it moves ahead at some point but the stalls defiantly dont help with my client trying to stake.

denravonska commented 7 years ago

@Foggyx420 I pulled all the beacons from starsdust when loading the blocks and I can only find one:

04/04/17 07:40:58 860695, beacon: A, MGI1ZWYyNTk0MTFlYzE4ZThkYWMyYmUwYjczMmZkMjM5YWM4Yzc5YTk4MzY5ODM1Yzc2ZDY5MzljOGM0YzgzYTNjM2EzZjMzNjc5N2NiNjczZjY0OTgzODNhM2IzNjNkNzQ3NTYyNzM3NDY1NzY3NDc1NDE2YTZmNjM3MDc5MmY2ZDc3Ozk3M2MzZjgyOWU4MGRjN2Q0OTAyMDg5YzkzYTUwZmIzZTFkOWFjYWVmOTE3NTQxYTNiNDgyNGNkODliZDM0MTE7UzgzRmRGUG9ISjlnQVhla25LUEwzOWd5SFhkanVQeXQzYjswNGEzYmNmODU3ZTY2OTA4OGQ3ZDMyZGViN2JiYTZlNDI3ZmYzYTVkODQyMmZjMjIxNThiNjA4NGMzNDFmOTBhNjNkNWRlMTgwZTNhZjYzZTljZjFhZTZiNjhhYjg0ZjYzOGQ2ZTQ0Y2RhMDZhYjhkZTk5YjQxNTY5MGI0MTcxM2Yx

I wonder if your node rejected the beacon when it received it but accepted it when restarting. Does it happen every time starsdust DPoRs? Even after a restart?

Edit: I'm also getting a lot of CPID errors for starsdust.

iFoggz commented 7 years ago

happened most the time when he got a dpor i started to fork but on occasion i did not. and i only posted about his beacon being renewed recently according to gridcoinstats.eu. the second beacon cpid i mentioned is another one that hasnt forked me but has bad cpids in log that was also recently renewed according to gridcoinstats.eu

iFoggz commented 7 years ago

yes also after retstart it corrects itself then when he staked a por it wud happen again.

edit : yes after restart it may of accepted it i do have the logs for the restarts and its behaviour

grctest commented 7 years ago

Client debug.log when on 'chain b' showed repeating checkblock failures:

Clearing mapAlreadyAskedFor. 04/04/17 13:06:07 Received block 1cc0fd066fc6f882b39ad4263bdbf6d0dbab348aa2df5776ca4e34e99893bfe7; Received block 90e4e50a203257d49b3ab7cdd4ff633414b0368826dba2b304054ec5c690e899; Recei$ 04/04/17 13:06:08 ERROR: ProcessBlock() : CheckBlock FAILED 04/04/17 13:06:08 Received block 72d9ed74a14bc441456db3e412f3ceb708fd03196e18b449ccfe9e28fa054a0c; Received block 8058276011b20e39c1c3d62b6b9ecd0324430c2b720152211b5dc83ba736e92d; Recei$ 04/04/17 13:07:10 ERROR: ProcessBlock() : CheckBlock FAILED AppInit04/04/17 13:07:12 Received block 42800c624aabcaa218519974ea5be7b46f2d2cdb9544bcc9072f00bc7637b075; AppInit Received block a4cbff0565a2314b6b7d9e93a8bdb5e5c4d412b0966896ff3516f0bc97$ 04/04/17 13:07:41 ERROR: ProcessBlock() : CheckBlock FAILED 04/04/17 13:07:41 Received block 8fba542b25e272138a46188b70dbf2d60b54bda98ac5726fe9b8d5d35a8f9729; Received block 3329b186d379fd9f3127e69a4f0bd65f53c3a9fcc07547cd9c7a9f70663606cc; Recei$ 04/04/17 13:07:43 ERROR: ProcessBlock() : CheckBlock FAILED 04/04/17 13:07:43 Received block 0a3863511d893a99e6832833b25287150cb74545e17962672dbea570abe4d20b; Received block aac2239626eed8f2254606afc61256b293d470e0e839ef9839d608ffcd4b05bf; Recei$ Clearing mapAlreadyAskedFor.

It seems that my client got stuck with checkblock failures, but after a reset it reorganized and got onto the right chain:

04/04/17 13:46:00 REORGANIZE 04/04/17 13:46:00 REORGANIZE: Disconnect 194 blocks; 490b6a185b596fea81e7..f8a7fa8d33265bc42a13 04/04/17 13:46:00 REORGANIZE: Connect 55 blocks; 490b6a185b596fea81e7..975767ee96c922df43eb 04/04/17 13:46:00 REORGANIZE Disc Size 194.000000

194 blocks is pretty high for a reorg

iFoggz commented 7 years ago

Anyway I wanted to add to the subject further from somethings I noticed today regarding this.

First i'm no expert but i been spending time on this today figuring out things from a outside kind of source and went even further so bare with me and take the information in and decide for yourself or comment accordingly.

we will refer to starsdust as stardusts but his CPID is 0b5ef259411ec18e8dac2be0b732fd23

now on-word:

First off on April 3, 2017 I had this in my logs

2017-04-03 17:57:48 Received block a86337aef5479db412093e262cc12a5bf8381a6244a132ed988fea6674809b49; -ERROR: ConnectBlock[] : Superblock hash does not match consensus hash; SuperblockHash: dd7ed15fdf1e7cca0f3b9f7f9c868702, Consensus Hash: ec6cf0f1e24a490319edfdb96a1bed86 2017-04-03 17:57:49 InvalidChainFound: invalid block=a86337aef5479db41209 height=861080 trust=3275838333972824660711873320762369734724920483558585417587751942133710632388 blocktrust=26336445264 date=2017-04-03 17:57:04 2017-04-03 17:57:49 InvalidChainFound: current best=0ac476b344efb6a03330 height=861079 trust=3275838333972824660711873320762369734724920483558585417587751942107374187124 blocktrust=22399274979 date=2017-04-03 17:54:40 2017-04-03 17:57:49 ERROR: SetBestChain() : SetBestChainInner failed 2017-04-03 17:57:49 ERROR: AcceptBlock() : AddToBlockIndex failed 2017-04-03 17:57:49 ERROR: ProcessBlock() : AcceptBlock FAILED

Following this I had:

2017-04-03 18:04:36 {SBC} new best=b76b5bfb9ac6c5d2aff7c2b03b0eb7d880ae7b9a64c35b2bf300ab78df086d16 height=861083 ; {PB}: ACC;

on April 03, 2017 Superblock 861083 was created at 18:04.

Shortly after I started receiving BAD CPID errors from stardusts CPID at 18:30 (#861101) which was his first staked PoR block since his last one on March 26th. Also gridcoinstats had originally said his beacon was newly advertised. From there forward my wallet had issues with staying sync with each PoR starsdust has staked. On Occasion it had gone of course and a restart of my wallet seemed to correct it each time (reorganize/regression). Each time his client PoR i'd have bad CPID issues with his CPID again and client would fall out of line and I'd have to repeat the steps. then sometime after 02:30 to 03:30 the occurrences had stopped in my logs and I didn't think much of it from there on. After further view (today) of the PoR of starsdust he didn't stake a PoR after 02:25 (#861399)

Then in morning I got up to the panic of the fork as there was forkists. I woke up and came on between 8am (-7:00) my time which is 15:00 on April 4th, 2017. During the days restarts generally always got me back away from a fork. However I later had looked up starsdust's CPID on gridcoinstats again and each time we had more issues start again he had PoR payments in the time frames. I remember erkan had asked the following:

Apr 04 10:24:35 anyone did something with his full nodes about 1.5h ago ? Apr 04 10:24:46 something at around that time must have caused the fork to happen

Time convert is 17:24

I looked up starsdust CPID again and noticed he started getting more PoR at 16:11 (#861932), 16:30 (#861943), 18:30 (#862024). Coincidence? (it appeared a lot more issues with activity at this time and of course more BAD CPID in my logs)

At 18:13 Block #862012 there was the new superblock and i had gotten:

2017-04-04 18:13:47 Received block 867763db7df0dbf366e3571366160263e8ea4193951e8f1a2611c29555ffceb7; -ERROR: ConnectBlock[] : Superblock hash does not match consensus hash; SuperblockHash: aa3920a997aa31f9dd0175711b7f241b, Consensus Hash: 2017-04-04 18:13:48 InvalidChainFound: invalid block=867763db7df0dbf366e3 height=862012 trust=3275838333972824660711873320762369734724920483558585417587751960956132392308 blocktrust=24484002312 date=2017-04-04 18:13:36 2017-04-04 18:13:48 InvalidChainFound: current best=96894542a4e640e96632 height=862011 trust=3275838333972824660711873320762369734724920483558585417587751960931648389996 blocktrust=27946381969 date=2017-04-04 18:11:12 2017-04-04 18:13:48 ERROR: SetBestChain() : SetBestChainInner failed 2017-04-04 18:13:48 ERROR: AcceptBlock() : AddToBlockIndex failed 2017-04-04 18:13:48 ERROR: ProcessBlock() : AcceptBlock FAILED

Then a new superblock at 18:20 on #862017 I had gotten this:

2017-04-04 18:20:08 Received block 08a5f9b76d6e1279f6b2428ed9fc4d52678ee25a8a16bd185dc60029d297ddc2; - [DoTallyRA_START] [DoTallyRA_END] 2017-04-04 18:20:09 {SBC} new best=08a5f9b76d6e1279f6b2428ed9fc4d52678ee25a8a16bd185dc60029d297ddc2 height=862017 ; {PB}: ACC;

So in my eyes it looks good.

Then at 18:30 #862024 starsdust had got another PoR payment at the past superblock magnitude he had and I got a different response in log:

2017-04-04 18:30:12 {SBC} new best=27c626dae2b5262389b25f7efceb126181ac4dc2b60b80785973347c593300d9 height=862024 ; {PB}: ACC;

Then at 18:43 #862031 starsdust had gotten another PoR payment but at his new magnitude and my log got:

2017-04-04 18:43:33 {SBC} new best=30738b5e5f5b9bce189a478a9124c35dc93b87a8b50ed419e55121b60640af10 height=862031 ; {PB}: ACC;

Since then I've had no BAD CPIDs at all and the Network after that time started to increase at a steady pace. Also At this point I checked gridcoinstats.eu again and noticed starsdust beacon advertised time had changed completely to a lower time.


From all this lengthly following and going through the logs and yes it took me quite some time I've come up with somethings.

Hypothetically speaking as I don't know a lot about gridcoin yet but I'm trying to learn most of it as I go:

1) Can a superblock affect a CPIDs advertised beacon time? 2) I know we fixed the old beacon issue but is this a new one arising from the fix? 3) Is this a one time thing because of maybe the timing of the mandatory upgrade to .8?? (We all can hope) 4) Starsdust CPID went forward like it was correct it seems without issue at all which makes me believe his client detected it as right and maybe there isn't a verification or checkpoint for local clients to verify their advertised beacon in the superblock is correct before sending out to the network?. 5) Why did a restart get things back on track for most (did the handling of the CPID different from starting the wallet compared to receiving from the network? along with the reorganize and regressions?) 6) If a superblock can have this much affect on the chain how can we possibly put in place checkpoints or checks to make sure this error could not occur. 7) Do the superblocks need to be verified further an extra step?

I don't think this would of been as bad of a problem if starsdust didn't have a lot of GRC to stake with as well. (700K+) but he earned it and hopefully we can learn from this experience and build from it.

I wanted to be more detailed in my report but lets face it a I'm a single father of 2 and only 3 hours of sleep last night and figured I'd should post my findings before i try to call it a night.

Cheers everyone.

Thanks @denravonska for the private message brainstorm and listening in. cheer man.

edit: i added superblock minority hash pick ups because it happened twice and it helped with timeline as well.

philipswift commented 7 years ago

@denravonska I'm just catching up, blown away by your grit and tenacity. You do all this and you are a single parent of two! Very moving dude. The only reason Gridcoin will succeed is down to people like you. Don't blow a valve though!

denravonska commented 7 years ago

@philipswift Thanks! But to be fair, I'm married. It's just that my wife happened to get a new job at the exact same week as the two kids got the flu, so I've been home doing random, sporadic GRC work. If I didn't do it someone else would, so the coin would be fine :)

philipswift commented 7 years ago

Sounds Hellish even if you are a temporary single parent, four snotty nostrils!. I'm not sure about your last comment, GRC will always exist as long as crypto's are about but having guys with excellent tech skills, as well as interpersonal skills is key. Makes fine into finer. BTW I've been keeping an eye on Ethereum and their Ubuntu is coughing up errors. Serious Industry chatter and Twitter wars as well, citing the coin as BTC's Enterprise replacement. Getting back to family, what are 'uncles'? Any idea? ETH has them as a value.

iFoggz commented 7 years ago

well things seem to be running smooth and after the long report above im kinda of the fence about that whole episode. im thinking we may not run into this issue again. been smooth since his beacon advertised time had changed. to bad he isnt here to chime is what he may of done. wonder if he registered his beacon in .7 and then upgraded to .8 and that caused issues. still thinking long and hard about this one and how superblock solved it. you want this left open or should i seal her up.