openchia / web

Website for OpenChia.io
GNU Affero General Public License v3.0
14 stars 12 forks source link

INVALID_TOO_LATE errors #299

Open Jacek-ghub opened 2 years ago

Jacek-ghub commented 2 years ago

I am getting those errors once every other day or so (not that many). Recently, I did few changes on my farm, changed a bit my ChiaDog reports, and today got lucky with one such error. Here is the data:

2022-06-03T07:20:49.203 harvester chia.harvester.harvester: INFO     5 plots were eligible for farming e0e9d42b09... Found 1 proofs. Time: 0.67186 s. Total 1499 plots
2022-06-03T07:20:49.597 farmer chia.farmer.farmer         : INFO     Submitting partial for 123... to https://pool.openchia.io

2022-06-03T07:21:33.417 harvester chia.harvester.harvester: INFO     5 plots were eligible for farming e0e9d42b09... Found 1 proofs. Time: 0.07810 s. Total 1499 plots
2022-06-03T07:21:33.798 farmer chia.farmer.farmer         : INFO     Submitting partial for 123... to https://pool.openchia.io

2022-06-03T07:21:34.095 farmer chia.farmer.farmer         : INFO     Pool response: {'error_code': 2, 'error_message': 'Received partial in 44.940791845321655. Make sure your proof of space lookups are fast, and network connectivity is good.Response must happen in less than 25 seconds. NAS or network farming can be an issue'}
2022-06-03T07:21:34.095 farmer chia.farmer.farmer         : ERROR    Error in pooling: (2, 'Received partial in 44.940791845321655. Make sure your proof of space lookups are fast, and network connectivity is good.Response must happen in less than 25 seconds. NAS or network farming can be an issue')

The above shows two partials found, where it looks like the first one is the offending one. In both cases, lookup times where well below 1 sec, and submission was followed within 300-400 msec.

As trying to nail down the issue is a bit tricky due to networking lags, I would still assume that 45 seconds is rather too long for a round trip (pinging pool.openchia.io gives me around 80 msec averages, so assuming your server is East Cost located (I am West Coast)). Although, we all know that if those round trip results are not measured at the time of the problem, those are rather worthless as far as telling what might happened at different times.

Here is where I got really lucky. I checked the top three farms in your pool, and all three have the same error at roughly the same time (15 sec spread). (Sure, it is possible that those three farms are also in California, but grasping for straws, this is a pattern that points to a potential problem on your side.)

Could you check logs on your side, whether you can find something that could potentially be addressed? Maybe you could run some reports against those late responses (all for the pool), and that could shed some light?

william-gr commented 2 years ago

The way this works is that the time counts starting from the Signage Point origin and stops when the pool gets a partial. Its very unlikely any of the too late errors is related to the lookup times and networking to the pool. They obviously could with very slow lookup times but not the case here.

Previous investigations seems to show thats the delay of the signage point propagating in the blockchain/nodes to your local node. It seems to happen when your local node has localized peers (not spread across the globe). Its interesting that others had same issue around the same time but I still don't have any evidence to point to a pool issute.

Some pools increase the timeout to above 25 seconds but that also increases the chances of bad farmers (long lookup times) not being flagged as invalid partials.

Jacek-ghub commented 2 years ago

It looks like a great catch for those signage points! Although, maybe not exactly how you described it. Here is the relevant signage data:

2022-06-03T07:19:03.640 full_node chia.full_node.full_node: INFO       Finished signage point 48/64: CC: af629891be2f5081104841fc612454200bf27ba222c9a2f827e88530ebaa674a RC: 52a621a15111c3283a315edbf04fc3ea7a4790c742d8ec6f80bc7de4e54cba90
2022-06-03T07:19:16.728 full_node chia.full_node.full_node: INFO       Finished signage point 49/64: CC: f6cc022914228872d830e8d73a3abcd7f980d95d44fb3fb8d918f8b3cfb5bada RC: 5e10c42a88028897b47386eff682cc967775b116b158aa7ca0c850fe682df087
2022-06-03T07:19:26.005 full_node chia.full_node.full_node: INFO       Finished signage point 50/64: CC: 535ad9fd9c7deddd2617687d97136df1bb85551efa6af99976153432cb557446 RC: 4e2dc06129a59a9e86a209393e635b8117ea3ced412dd160ac14565da2e41e90
2022-06-03T07:19:35.983 full_node chia.full_node.full_node: INFO       Finished signage point 51/64: CC: 56563f5cba4a16356521a91ab83eb1c15c3f9444fdc92844caa8d8adfdee482c RC: 18c0cb5244b10c32bbc7628c7bb3dc0f7fdd117a4fd9f0617cc3233eb3d69d19
2022-06-03T07:19:44.921 full_node chia.full_node.full_node: INFO       Finished signage point 52/64: CC: 90e0d4b3dba2d7202a00f2ffe6c556343ac85369396e20aa04fc530f350dc3ff RC: b474f73bac29ad24356e522f7a2eae910ea0e32cefd4b8f6979eac7e8453a5c5
2022-06-03T07:19:53.816 full_node chia.full_node.full_node: INFO       Finished signage point 53/64: CC: 9cc6983cd3694d9c4151c1bc9a3170c9f9930e3d8e4fbe4199cad0775518128e RC: 74e577716756b2543d138406e0ebe3a886a9e0c33b177e4381cdf040cb10b8d1
2022-06-03T07:20:04.005 full_node chia.full_node.full_node: INFO       Finished signage point 54/64: CC: 4e0d4964e16643d8bea121fc4db18671824187efd09f2cc7954890e87abb7b48 RC: 5ee7be32928594c3eaa967d791c2625f8ad00eeae50e4d9d22d223c72b396ce7
2022-06-03T07:20:12.794 full_node chia.full_node.full_node: INFO       Finished signage point 55/64: CC: 00d9bc6ea54bd9d075d1ac588004a9d8fc0ed88c884f61d54a3898a97f03c624 RC: 9bb638915bb990dd1029f11d1a7c55bdf3ea26dabef3da23fb8a9afd3abc49e0
2022-06-03T07:20:21.818 full_node chia.full_node.full_node: INFO       Finished signage point 56/64: CC: 0121ca0556003da57fcd20938628af1ad3686652f3890180125c0964f2f3ed52 RC: bf50ba45e20231413e2ec87a112522c1ba2319cf129fef414ccbbb87a70ee4f6
2022-06-03T07:20:30.843 full_node chia.full_node.full_node: INFO       Finished signage point 57/64: CC: 743cdee553ab03ba4869da2851dfd0c1a8314523f78a86a8ada128a135521274 RC: 7335292e4aed7be7f2586a7b026c5c5a95b41b2057177cc43f847dcfedbbf745
2022-06-03T07:20:39.971 full_node chia.full_node.full_node: INFO       Finished signage point 58/64: CC: 5381e312ef553e0fab738381763c6b4fb20162ff0db512d3e9fdc6255ef8ff10 RC: d4b6fbe4d8e124c907155c8b39ec7b1997b20aad72dfb1bb6bf28690695262da
2022-06-03T07:20:48.879 full_node chia.full_node.full_node: INFO       Finished signage point 59/64: CC: ddff6bdf84d0cd3e005e92847f7f56b627555ddb7852d70d2cb46942a02094ac RC: ff03191635394491b51b690a2d420bc0cdb300281c5e6576d090790cf21cbffa
2022-06-03T07:20:58.167 full_node chia.full_node.full_node: INFO       Finished signage point 60/64: CC: af1b765a5847a7c5b417564d41fc273b5dc03bfb5cbee1f8dd9093c7dd75b664 RC: 1b78e78784bd56c3c1f02b338ed8b8362993dc162560969c58b2a98166d41fa2

2022-06-03T07:21:07.435 full_node chia.full_node.full_node: INFO       Finished signage point 56/64: CC: 0121ca0556003da57fcd20938628af1ad3686652f3890180125c0964f2f3ed52 RC: bc979b40e3b10d540bbae4a167d6f8bdeac07ca45711a668b44f73790ee8f32f
2022-06-03T07:21:16.062 full_node chia.full_node.full_node: INFO       Finished signage point 57/64: CC: 743cdee553ab03ba4869da2851dfd0c1a8314523f78a86a8ada128a135521274 RC: ef819120e3d33b2c1667f2134ba9fad9a478f76280c2364a13af8cf4306155f5
2022-06-03T07:21:25.094 full_node chia.full_node.full_node: INFO       Finished signage point 58/64: CC: 5381e312ef553e0fab738381763c6b4fb20162ff0db512d3e9fdc6255ef8ff10 RC: 1636874334bb66d724d7a4a18ed08a28b5d7cb5ddf2778c45e825e44f6bb0763
2022-06-03T07:21:33.689 full_node chia.full_node.full_node: INFO       Finished signage point 59/64: CC: ddff6bdf84d0cd3e005e92847f7f56b627555ddb7852d70d2cb46942a02094ac RC: ce809e2efbe2a604585e0462c7ac0aaff857aae73193b6339706035300de174b
2022-06-03T07:21:42.015 full_node chia.full_node.full_node: INFO       Finished signage point 60/64: CC: af1b765a5847a7c5b417564d41fc273b5dc03bfb5cbee1f8dd9093c7dd75b664 RC: d65a573cc7eb3da2724dbe20847f843087c1c3721e5dc563845ef443c11504e8
2022-06-03T07:21:50.254 full_node chia.full_node.full_node: INFO       Finished signage point 61/64: CC: d24c51eeef3283c8e30ddd3746fa02b553ed324c008d8e5d92a1d238121451b1 RC: 541281a35e047e644623d68b76d469362f5f32090c4294eb7cffdbe690e80ef8
2022-06-03T07:21:58.851 full_node chia.full_node.full_node: INFO       Finished signage point 62/64: CC: 13e888966dc0a661d5f4693456d0f3f0ac5c3fc997b0a237397d54eaccae1687 RC: 320ea1450eff8f44c2da88cfaea5984c1ddbaa39178f43d8ccc4f21df6b66496
2022-06-03T07:22:07.079 full_node chia.full_node.full_node: INFO       Finished signage point 63/64: CC: 6dbe89676f6d839a7a3cb5c6ad008aebdbf44ca73632e49717df4d4b96e0fff4 RC: 25118908690c96f7e1c51435ad0e5ded6acc07152dbabdae4b3fe4992768d978

2022-06-03T07:22:29.824 full_node chia.full_node.full_node: INFO       Finished signage point 1/64: CC: 92746b96e34ed8ad47346ef072458a71c3fcaa2b14bfe4ce45ac5b8cfbda6765 RC: 896e3fe8b49552a71ce4972631fdbb3be4ee468763a6fbe209741ee8f8873739
2022-06-03T07:22:38.294 full_node chia.full_node.full_node: INFO       Finished signage point 2/64: CC: dc75d1aab59bb55409ec41645e6bead62b7859c3c1b5f26d1361a51751fe9223 RC: 8003862063be657a96a0a07f1aeb401a6ce122ac31fcb6c529d56e86c7b636d7
2022-06-03T07:22:48.135 full_node chia.full_node.full_node: INFO       Finished signage point 3/64: CC: 97f448c2d4501b6fa879bff284c3f44036844207cf718fda752e4d668a919212 RC: 56c39c9d7f2816c0730affe8634066f1caf74d4528230f3587172729d8c7ec21
2022-06-03T07:23:01.870 full_node chia.full_node.full_node_store: INFO     Don't have rc hash 159d680e15b75a18b3d8b45a92e282a6400df63dfd4466c412b8b34ec202b8e6. caching signage point 4.
2022-06-03T07:23:04.245 full_node chia.full_node.full_node: INFO       Finished signage point 4/64: CC: 080c8beb8775332733745dbc5b8907a32887174da81a416bf2dc46291cf8f08f RC: 51a85b685a4e1da2400568a9850022422aba61e931515058c130900bb23a2b8d

2022-06-03T07:23:05.012 full_node chia.full_node.full_node: INFO       Finished signage point 1/64: CC: 92746b96e34ed8ad47346ef072458a71c3fcaa2b14bfe4ce45ac5b8cfbda6765 RC: 896e3fe8b49552a71ce4972631fdbb3be4ee468763a6fbe209741ee8f8873739
2022-06-03T07:23:05.012 full_node chia.full_node.full_node: INFO       Finished signage point 2/64: CC: dc75d1aab59bb55409ec41645e6bead62b7859c3c1b5f26d1361a51751fe9223 RC: 8003862063be657a96a0a07f1aeb401a6ce122ac31fcb6c529d56e86c7b636d7
2022-06-03T07:23:05.012 full_node chia.full_node.full_node: INFO       Finished signage point 3/64: CC: 97f448c2d4501b6fa879bff284c3f44036844207cf718fda752e4d668a919212 RC: 56c39c9d7f2816c0730affe8634066f1caf74d4528230f3587172729d8c7ec21

2022-06-03T07:23:08.464 full_node chia.full_node.full_node: INFO       Finished signage point 5/64: CC: 83a39d97e9a8fe64dd81414893a4f85b242c171d2acfec8e04fee037da06a5f3 RC: 6d2636594cd2c24b2a89070663ac71006d13d8381aa859bb4e6bbcd388f8db8e
2022-06-03T07:23:19.480 full_node chia.full_node.full_node: INFO       Finished signage point 6/64: CC: 953dd86de883ea8187f430a97969d809bab1eb6b1b3db51d78065b733142ebe5 RC: 37302e620196f19edacaf4391adef9df3e07e7ec894019d5e1c492a51ae3f8a7
2022-06-03T07:23:27.339 full_node chia.full_node.full_node: INFO       Finished signage point 7/64: CC: 1aba7a997ff2aa2de60451111bdffce63fe66feaa013c4d2fc8c76d8eef980c1 RC: 0ae2de0ca9708af6c751f061f7da6f9519aaf5573b7d7b558b13c379c3a2f792
2022-06-03T07:23:36.031 full_node chia.full_node.full_node: INFO       Finished signage point 8/64: CC: f972fc73d5f217aadc3b8c928b113d50d5026f5f917c9cc0eaeda8b1d3d09b42 RC: df62d77da0b3ce6742ddd51733b97a17f0ff3f5b3c8e1fa0dc3b0892b6e29ffb
2022-06-03T07:23:47.727 full_node chia.full_node.full_node: INFO       Finished signage point 9/64: CC: 03e9c5ab6ff048ffbca25b716ef341efcac1b328f13f4d4fc48ea0928810c478 RC: e89dee2c6d7eb71c07e821bb31f46d7cb94931963b49080312c147abf91e5626
2022-06-03T07:23:56.655 full_node chia.full_node.full_node: INFO       Finished signage point 10/64: CC: a20b9ea4b1c93e17f1253c55f1f30528542b38b9063239ca9eb955f2ee89a26b RC: 7299a54f01bb65ae1badfa957f424fa23d30ef5abe98453c5344107d9aa0aa83

The interesting part is that all those signage points for those three minutes are coming roughly 10 secs apart (what is as expected). However, I don't understand why there is such repeat there (where I put that break line).

So, there were basically no Chia network propagation delays, no double signage points, just some really odd behavior.

I have never looked at those signage points before, so am not sure how often that happens, and/or whether this looks like a some kind of bug.

william-gr commented 2 years ago

Interesting... the CC is the same, but the RC is different in some cases, I don't really know what that means.

Makes it sound like some sort of reorg, or peers disagreeing/stepping with each other...

Jacek-ghub commented 2 years ago

It looks kind of like reorg, as when the first repeat started (from 56), till the last one (63) all the timing is correct. My take is that if peers would be disagreeing, the timing would be a mess.

Although, it took 20 secs to start the new batch (starting at 1). However, when the second sequence (starting at 1) started, it is kind of a mess for the first three have the same time, and #4 is missing. Then timing goes back to normal.

By the way, who is generating those signage points (timelords?)? It looks more that the problem is on generation of those signage points, rather than network / node propagation.

By the way, maybe you could update the error tip output ("Partial error details") to also mention that those errors may have nothing to do with the farmer, but rather "messed up" signage points.

Jacek-ghub commented 2 years ago

I put together those harvester logs with signage points logs, and in both cases looks like the same signage point (59) was processed. Here is the output:

2022-06-03T07:20:48.879 full_node chia.full_node.full_node: INFO       Finished signage point 59/64: CC: ddff6bdf84d0cd3e005e92847f7f56b627555ddb7852d70d2cb46942a02094ac RC: ff03191635394491b51b690a2d420bc0cdb300281c5e6576d090790cf21cbffa

2022-06-03T07:20:49.203 harvester chia.harvester.harvester: INFO     5 plots were eligible for farming e0e9d42b09... Found 1 proofs. Time: 0.67186 s. Total 1499 plots
2022-06-03T07:20:49.597 farmer chia.farmer.farmer         : INFO     Submitting partial for 123... to https://pool.openchia.io

2022-06-03T07:21:33.689 full_node chia.full_node.full_node: INFO       Finished signage point 59/64: CC: ddff6bdf84d0cd3e005e92847f7f56b627555ddb7852d70d2cb46942a02094ac RC: ce809e2efbe2a604585e0462c7ac0aaff857aae73193b6339706035300de174b

2022-06-03T07:21:33.417 harvester chia.harvester.harvester: INFO     5 plots were eligible for farming e0e9d42b09... Found 1 proofs. Time: 0.07810 s. Total 1499 plots
2022-06-03T07:21:33.798 farmer chia.farmer.farmer         : INFO     Submitting partial for 123... to https://pool.openchia.io

2022-06-03T07:21:34.095 farmer chia.farmer.farmer         : INFO     Pool response: {'error_code': 2, 'error_message': 'Received partial in 44.940791845321655. Make sure your proof of space lookups are fast, and network connectivity is good.Response must happen in less than 25 seconds. NAS or network farming can be an issue'}
2022-06-03T07:21:34.095 farmer chia.farmer.farmer         : ERROR    Error in pooling: (2, 'Received partial in 44.940791845321655. Make sure your proof of space lookups are fast, and network connectivity is good.Response must happen in less than 25 seconds. NAS or network farming can be an issue')

Knowing that the second submission was against a duplicate signage point, all the timing make sense right now. Pool responded with an error to the second submission, and it took only about 200 msec to get that response. So, we know that there is nothing wrong with the timings.

Saying that, do you know why the pool took the base time for the second signage point from the first sp issued (59), not the second one (the duplicate)? Maybe there is something in the submitted info that could identify the exact signage point, so pool could squash such reports? (I really don't know how that part of the protocol works.)

"The way this works is that the time counts starting from the Signage Point origin and stops when the pool gets a partial."

How "the Signage Point origin" is defined? Is it based on the initiation of the SP batch (i.e., SP 0 plus 10x current), or rather from the current SP?

william-gr commented 2 years ago

Honestly, I don't know. I will need to research on that.

I noticed the CC on that is the same, but RC is different. I don't know yet what RC means... We do cache the signage points in the pool the first time they are seen so I do wonder why its not being flagged as DOUBLE_SIGNAGE, makes me think they look like the same, but in reality they are not (different RC)

Jacek-ghub commented 2 years ago

Chia's GitHub?

I guess, we see two problems that maybe they can explain. The first is the logic behind those duplicate SPs. The second is how the pool should react to those.

If you want, I can better mark those logs, and post it on their GH, and maybe you could chime in to narrow down what is expected from them to be explained. Or you can take my logs and post it there. I have no preference.

UPDATE I have searched for an error DOUBLE_SIGN... and what I found also has the same CC but different RC, so basically looks more or less the same as this one.

So, I guess, I am at my limits with this problem right now, so will let you work on it. If you need some additional info, let me know.

william-gr commented 2 years ago

Thanks, very helpful so far. I will see what I can find but may time a while since I am on vacation.

Jacek-ghub commented 2 years ago

Kind of thinking about it more, and trying to put it all together, I think that we really have two cases here:

  1. Chain corrects itself
  2. Burps by some peers

The first one (chain self-correction) is when at some point we will have SP coming on schedule but with indexes that were already processed before. Those have different CCs and RCs and are indicated by the pool server as INVALID_TOO_LATE. However, maybe those SPs/proofs still need to be processed, as we don't know what caused chain reorganization, and whether chain thinks the first (not likely), or rather the second (there has to be the reason to push the second) should be valid, plus those come on schedule. So, I would assume that those should be processed as a normal SP (pushed to the network, and not locally marked as TOO_LATE). Although, the big question is what happens if chain already recognized the previously submitted proof(s) as the winning ones, and this new batch also has winning proofs. Still, I would say it is better to submit those extra proofs, and let the chain figure out what to do, rather than miss an opportunity.

On the other hand, if those are peer burps (some spurious extra SP), those will not preserve SP timings. Therefore, depending on whether the server stores who submitted partials previously, those should not be marked as DOUBLE_SIGNAGE_POINTS. Sure, this is an error condition on the farmer, as the farmer should not be processing those burps. However, as those indicated that the farmer screwed up (by processing the same thing once more), there is no reason for the pool server to recognize it as farmer error, rather quietly sweep those under the rug. Although this is a bit tricky. The reason that I am saying that those should not be marked as doubles is that those are responses to challenges, and if the farm has duplicated plots, it should respond with dups in a single submission. On the other hand, if the farm replies with two found proofs in the same submission, that is a clear indication that there are duplicated plots.

Again, I don't know much about how those are processed by the code, so all that is only based on what the logs show.

william-gr commented 2 years ago

There is no such thing as two found proofs per submission. Its one request per proof. There is no way to tell from the pool what is going on as far as I can see, if anything these can/should be handled on the farmer client side.

Jacek-ghub commented 2 years ago

There is no such thing as two found proofs per submission. Its one request per proof.

Sorry, maybe that should be partials? What is triggering DOUBLE_SIGNAGE_POINT errors, then?

william-gr commented 2 years ago

Two different requests/partials for the same Signage Point?

Jacek-ghub commented 2 years ago

Assuming that harvester found two proofs processing a single challenge, e.g.: harvester chia.harvester.harvester: INFO 6 plots were eligible for farming ab2fb7a936... Found 2 proofs. Assuming that those are not coming from duplicates, will it report it to the pool in one message or two?

Assuming that a duplicate plot was hit. Will it report both (identical) proofs, or will it internally squash the duplicate?

william-gr commented 2 years ago

I think the pool will get two distinct partials.

Jacek-ghub commented 2 years ago

I guess, this is the whole point of this discussion. Seeing the logs on the farmer, we have a better understanding of the patterns in each case. Can logs on the server side be also scrutinized to eventually squash those errors, or differently is there enough info in the submitted data to better classify those errors.

william-gr commented 2 years ago

The request the pool receives only contains the partial proof, and the signange point hash. https://github.com/Chia-Network/chia-blockchain/blob/main/chia/protocols/pool_protocol.py#L68

I don't see what else we could gather to squash anything, but I will keep thinking.

Jacek-ghub commented 2 years ago

Yeah, that place has that class really well documented. I would rather go to either farmer.py or farmer_api.py to see how those members are filled.

I think more info is on your pool side. Not just what you get from all farmers and the blockchain, but also what you store in the db (to not kill at the same time performance, though). It would be nice, if you could provide a JSON string that comes from a submission POST (even better, if it would be already preprocessed by the server, so the structure would be better visible).

Saying that, I would consider the following cases for classifying proofs:

  1. Proof A - first proof(s) for a given SP idx, hopefully stored, at least for few minutes or so
  2. Proof B -
  3. Proof C - coming from a duplicated plot for proof B (PB)
  4. Proof D
  5. Proof E - burp from the network, the same SP idx as PF, duplicate for PD, let's say the error should be suppressed if it can be (to not produce duplicate)
  6. Proof F
  7. Proof G - chain reorg, the same SP idx as PF
  8. Proof H - a late submission

Of course, proofs A, B, D, F should normally processed, as those are basically the first proofs for a given SP idx, and just serve as the base for the next proof

Proof C, as coming from a duplicate plot should be coming right on the hills of PB (we can ignore farmer processing time, but expect some network jitter). Maybe if such proof comes within 1-2 sec (maybe just 100-200 ms would do) or so from the previous one, we could assume that this is a real plot duplicate. This is clearly an error on the harvester / farmer side that is not handled right by Chia team. If there are duplicates on one harvester, then harvester should be squashing them, and flagging as such. On the other hand, if different harvesters have duplicate plots, that would be farmers job to squash those, as proofs should be identical.

Proof E is basically identical to proof C, except that timing difference could be slightly bigger, so maybe if that difference is bigger than 2-3 secs, we could assume that those are network burps. It looks like farmer is sending the SP idx, although, I am not sure whether those proofs would be identical (pool logs could help here).

Proof G. This one is kind of a wild card. Although, in the logs that I got for this case, those SP were coming on schedule (exactly in their 10 secs intervals), plus RC/CC were different. So, the first thing is that we would have same SP idx, but everything else should be different (most likely). Currently, those are marked as TOO_LATE. However, pool should already store PF, as such this cannot be too late, as it is clearly at worst a duplicate for that PF SP idx). Also, depending how deep the reorg is, those should be coming with 10/20/... sec latency comparing to the original one. As mentioned, if such proofs could be identified, my understanding is that those should be still submitted, as we have no clue what was the reason for the reorg, and based on that which proof will be validated. I guess, that explanation is a bit simplistic. It may be also the case that there were no prior submission for SPs with indexes equal or higher than this one, as such it will classify as a late one (if no chain reference taken with respect to reorg). Although, if there were any submissions for index higher (no equal), maybe that indicates that the drive that holds a plot went to sleep, and this sp was processed late.

Proof H will be clearly late, if that is the very first one for that SP idx (this is the main difference between this and PG), and is coming late (based on the current server logic).

Agreed, all that above is mostly based on timings and previous proof, and possibly not that stable. Although, maybe if even some of those can be identified, that would be a good enough improvement? I have never seen how a submitted proof looks like (i.e., that JSON string), neither know how much data you store on your side, so cannot comment on that side.

No need to comment. I think that I have all that off my mind now.