Closed GR3G0RYO closed 1 day ago
For investigation I extracted all transactions from the bdjuno database and I take plmnt1wca0fwyu5nl2h4sf7svwp647est4kmf6h7tq8s
to show my findings:
The address participated 303 times in PoP:
$ grep MsgInitPop transaction_db.csv | grep plmnt1wca0fwyu5nl2h4sf7svwp647est4kmf6h7tq8s | wc -l
303
19 times a PoP result was reported:
$ grep MsgReportPopResult transaction_db.csv | grep plmnt1wca0fwyu5nl2h4sf7svwp647est4kmf6h7tq8s | wc -l
19
19 times it was unsuccessful:
$ grep MsgReportPopResult transaction_db.csv | grep plmnt1wca0fwyu5nl2h4sf7svwp647est4kmf6h7tq8s | grep '""success"": false' | wc -l
19
(and 17 times it did not finish ""finished"": false
)
happens once a day. seems to work fine:
$ sudo journalctl -u planetmint-go.service |grep "staged claims successfully for provided PoP heights"
Oct 22 02:39:53 ip-10-22-22-254 planetmint-god[947243]: 2:39AM INF [app] staged claims successfully for provided PoP heights: 3168075 3171675 module=server
Oct 23 03:01:39 ip-10-22-22-254 planetmint-god[947243]: 3:01AM INF [app] staged claims successfully for provided PoP heights: 3171675 3175275 module=server
Oct 24 03:23:25 ip-10-22-22-254 planetmint-god[947243]: 3:23AM INF [app] staged claims successfully for provided PoP heights: 3175275 3178875 module=server
Oct 25 03:45:16 ip-10-22-22-254 planetmint-god[947243]: 3:45AM INF [app] staged claims successfully for provided PoP heights: 3178875 3182475 module=server
Oct 26 04:07:17 ip-10-22-22-254 planetmint-god[947243]: 4:07AM INF [app] staged claims successfully for provided PoP heights: 3182475 3186075 module=server
Oct 27 04:29:06 ip-10-22-22-254 planetmint-god[947243]: 4:29AM INF [app] staged claims successfully for provided PoP heights: 3186075 3189675 module=server
Oct 28 04:50:51 ip-10-22-22-254 planetmint-god[947243]: 4:50AM INF [app] staged claims successfully for provided PoP heights: 3189675 3193275 module=server
Oct 29 05:13:44 ip-10-22-22-254 planetmint-god[425]: 5:13AM INF [app] staged claims successfully for provided PoP heights: 3193275 3196875 module=server
This is the code handling burning STAGEDCRDDLs and minting CRDDLs: https://github.com/planetmint/planetmint-go/blob/v0.11.0/x/dao/keeper/msg_server_distribution_result.go#L40-L127
It does not produce any errors, otherwise we would see:
util.GetAppLogger().Error(ctx, "%s for provided PoP heights: %d %d", types.ErrResolvingStagedClaims.Error(), distribution.FirstPop, distribution.LastPop)
in our logging.
I created a custom planetmint-god
binary with more logging and deployed it to api.plmnt.r3c.network. I wait for the distribution to run tonight and tomorrow i am gonna check the logs...
Just a thought: It might be connected to the down time between 18th and 21st of October (no space left of validators). My gut feeling says that it stopped working around that time...
maybe related (reported by logcheck):
Oct 29 15:39:38 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:38.932170599Z caller=periodic_tasks.go:78 level=error msg="error while minting 1094946 tokens (tx id df73c93b8493d40fc77242f2df27dcb5f5bb0ad95c7cd3e4a8a5c3c1d741d785) for address plmnt12t05rud2q3n9plscehf3s0yyan42pp8asf9wu4"
Oct 29 15:39:38 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:38.932218246Z caller=backend.go:98 level=error error="Failed to convert entry: lq1qqvwwg28rpygn9skfkrhd6vu9gh5qqfclut4uz894auz6lj9t2lnl04lm0gyw8yg0dnn6xnl47zmexpthp34h5s2zk828xt2ys - error while minting 1094946 tokens (tx id df73c93b8493d40fc77242f2df27dcb5f5bb0ad95c7cd3e4a8a5c3c1d741d785) for address plmnt12t05rud2q3n9plscehf3s0yyan42pp8asf9wu4"
Oct 29 15:39:38 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:38.98306032Z caller=periodic_tasks.go:78 level=error msg="error while minting 1094946 tokens (tx id 6dbdcd718ca940aba37d89ee4886eb597f0468e5b97f88c1cb114729278815a1) for address plmnt128pw4xqn8hnagt98m0vllrxrlc8cvqduvqh2ju"
Oct 29 15:39:38 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:38.983107383Z caller=backend.go:98 level=error error="Failed to convert entry: lq1qqtqnldxpsvywc7wc4d0qv4230zv6wefhkzrs8w7usjz0f2wl29p2sdpes03k5wrr7wgtx3equefquhyltfwdkx2qwakl2xesh - error while minting 1094946 tokens (tx id 6dbdcd718ca940aba37d89ee4886eb597f0468e5b97f88c1cb114729278815a1) for address plmnt128pw4xqn8hnagt98m0vllrxrlc8cvqduvqh2ju"
Oct 29 15:39:39 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:39.032525462Z caller=periodic_tasks.go:78 level=error msg="error while minting 1094946 tokens (tx id dd69c6bcbcbf0196d53bb012eb2f6e0c924752185bd806dd74289d9033c52c71) for address plmnt1e33kn4u8yfktrp6gd4eq4xq5gqfcdm5fkyp73w"
Oct 29 15:39:39 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:39.032638725Z caller=backend.go:98 level=error error="Failed to convert entry: lq1qqdefv0u4rt3j4s2taf3fk7yurhjkktsyn7czsragc8mu5x035hsjnkd495m0h7zhh8v4up5k0yhcaz6qnu6y2mt3tw8xaunf3 - error while minting 1094946 tokens (tx id dd69c6bcbcbf0196d53bb012eb2f6e0c924752185bd806dd74289d9033c52c71) for address plmnt1e33kn4u8yfktrp6gd4eq4xq5gqfcdm5fkyp73w"
Oct 29 15:39:39 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:39.070370778Z caller=periodic_tasks.go:78 level=error msg="error while minting 1094946 tokens (tx id ceef466cb47df7b95e28c6dae9bcc604202f05954e82c445cd7bd8b335c90dab) for address plmnt1p498zlxe4yhz5wqllgvk22ucnezgje5f9zexwz"
Oct 29 15:39:39 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:39.070413349Z caller=backend.go:98 level=error error="Failed to convert entry: lq1qq2zyum2hfjrsj3uzggl33znr9ydgga5jjjuczv6jyz0zc96833nl9mjqpme7zwl75p2qcwkf607uwymj69yusactjvwhn87zk - error while minting 1094946 tokens (tx id ceef466cb47df7b95e28c6dae9bcc604202f05954e82c445cd7bd8b335c90dab) for address plmnt1p498zlxe4yhz5wqllgvk22ucnezgje5f9zexwz"
here is the custom log I took with the custom binary mentioned above: js_distribution.log
This is the corresponding diff:
diff --git a/x/dao/keeper/msg_server_distribution_result.go b/x/dao/keeper/msg_server_distribution_result.go
index e82e392..d7036c6 100644
--- a/x/dao/keeper/msg_server_distribution_result.go
+++ b/x/dao/keeper/msg_server_distribution_result.go
@@ -43,10 +43,12 @@ func (k msgServer) resolveStagedClaims(ctx sdk.Context, start int64, end int64)
if err != nil {
return err
}
+ util.GetAppLogger().Info(ctx, "JS: len(challenges): %d", len(challenges))
popParticipants := make(map[string]uint64)
for _, challenge := range challenges {
+ util.GetAppLogger().Info(ctx, "JS: challenge: %v", challenge)
// if challenge not finished nobody has claims
if !challenge.GetFinished() {
continue
@@ -67,6 +69,7 @@ func (k msgServer) resolveStagedClaims(ctx sdk.Context, start int64, end int64)
popParticipants[initiatorAddr.String()] += validatorPopReward
}
+ util.GetAppLogger().Info(ctx, "JS: len(popParticipants): %d", len(popParticipants))
// second data structure because map iteration order is not guaranteed in GO
keys := make([]string, 0)
for p := range popParticipants {
@@ -88,9 +91,12 @@ func (k msgServer) convertAccountClaim(ctx sdk.Context, participant string, amou
if err != nil {
return err
}
+ util.GetAppLogger().Info(ctx, "JS: accAddr: %v", accAddr)
accStagedClaim := k.bankKeeper.GetBalance(ctx, accAddr, k.GetParams(ctx).StagedDenom)
+ util.GetAppLogger().Info(ctx, "JS: accStagedClaim: %v", accStagedClaim)
+
if accStagedClaim.Amount.GTE(sdk.NewIntFromUint64(amount)) {
burnCoins, mintCoins := k.getConvertCoins(ctx, amount)
err = k.bankKeeper.SendCoinsFromAccountToModule(ctx, accAddr, types.ModuleName, burnCoins)
@@ -108,6 +114,7 @@ func (k msgServer) convertAccountClaim(ctx sdk.Context, participant string, amou
return err
}
}
+ util.GetAppLogger().Info(ctx, "JS: convertAccountClaim successful")
return
}
Unfortunately I did not log the most crucial part:
if accStagedClaim.Amount.GTE(sdk.NewIntFromUint64(amount)) {
...
}
I observe two things:
maybe related (reported by logcheck):
Oct 29 15:39:38 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:38.932170599Z caller=periodic_tasks.go:78 level=error msg="error while minting 1094946 tokens (tx id df73c93b8493d40fc77242f2df27dcb5f5bb0ad95c7cd3e4a8a5c3c1d741d785) for address plmnt12t05rud2q3n9plscehf3s0yyan42pp8asf9wu4" Oct 29 15:39:38 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:38.932218246Z caller=backend.go:98 level=error error="Failed to convert entry: lq1qqvwwg28rpygn9skfkrhd6vu9gh5qqfclut4uz894auz6lj9t2lnl04lm0gyw8yg0dnn6xnl47zmexpthp34h5s2zk828xt2ys - error while minting 1094946 tokens (tx id df73c93b8493d40fc77242f2df27dcb5f5bb0ad95c7cd3e4a8a5c3c1d741d785) for address plmnt12t05rud2q3n9plscehf3s0yyan42pp8asf9wu4" ...
unfortunately we are not logging the actual error message returned by r2p.pmClient.MintPLMNT
containing the details about what is wrong...
Side note: During my investigation i often stumble over the fact, that our error logging is not ideal. we log some error, but the actual interesting message we do not log. We might to do some refactoring/coming up with a error logging/handling coding guide line.
I like this article very much (it is in German though) https://blog.alexandergunkel.eu/error-handling-in-golang/
Each error is handled in exactly one place ... Such an approach is often easy to implement and makes no demands on the architecture of the application, but has the disadvantage of greater complexity, which often means that error handling is duplicated or unintentionally omitted. At this point, it is to be feared that another log message will be written at a higher level, leading to redundancies and log spam. It is therefore better to find a single place where the error is addressed and then not to pass the error on. It would be better to pass the error on to the calling function:
... if err := fnt(param); err != nil { return fmt.Errorf("failed doing XYZ with param %s: %w", param, err) } ...
It should be noted that the enrichment of the error with information does not count as error handling, of course, nor does the release of resources that have been allocated within the function - for which it is better to use a
defer
.
Is it correct, that we challenge every 8 minutes (20 * 24s/60s)?
...
Oct 30 10:59:18 ip-10-22-26-19 planetmint-god[426]: 10:59AM INF [app] MQTT message successfully sent: initiator:"1adb868b2b9623510dbe3d7419e4d2b5e712e08d" challenger:"plmnt1z3f38elc3ycm54037csr2yfznv5pm5tlx6rw2z" challengee:"plmnt1fxxw0xkmwhanm6yhhq0tqa0fe388g095cc53ym" height:3201280 module=server
Oct 30 11:07:24 ip-10-22-26-19 planetmint-god[426]: 11:07AM INF [app] MQTT message successfully sent: initiator:"1adb868b2b9623510dbe3d7419e4d2b5e712e08d" challenger:"plmnt14uv2pfvyy4jm7f9unfaha7kd3lnt07lv7gr5t9" challengee:"plmnt1jpgdg9mzh9f6cuku47ugqrgr3tnmc09u9zlwz3" height:3201300 module=server
PoPs being unsuccessful quite often is nothing new. Going through all transactions: 37976 of 48511 were unsuccessful, but more were finished then above: 36638 of 48511.
@jmastr Currently 4 validators are orchestrating PoP. A PoP occurs every 2 min. so each validator processes a pop every 8 minutes.
@jmastr Currently 4 validators are orchestrating PoP. A PoP occurs every 2 min. so each validator processes a pop every 8 minutes.
That makes sense. Thanks!
Here is a better log: js_distribution.log
Generated with this diff:
diff --git a/x/dao/keeper/msg_server_distribution_result.go b/x/dao/keeper/msg_server_distribution_result.go
index e82e392..40ce5fe 100644
--- a/x/dao/keeper/msg_server_distribution_result.go
+++ b/x/dao/keeper/msg_server_distribution_result.go
@@ -43,10 +43,12 @@ func (k msgServer) resolveStagedClaims(ctx sdk.Context, start int64, end int64)
if err != nil {
return err
}
+ util.GetAppLogger().Info(ctx, "JS: len(challenges): %d", len(challenges))
popParticipants := make(map[string]uint64)
for _, challenge := range challenges {
+ util.GetAppLogger().Info(ctx, "JS: challenge: %v", challenge)
// if challenge not finished nobody has claims
if !challenge.GetFinished() {
continue
@@ -67,6 +69,7 @@ func (k msgServer) resolveStagedClaims(ctx sdk.Context, start int64, end int64)
popParticipants[initiatorAddr.String()] += validatorPopReward
}
+ util.GetAppLogger().Info(ctx, "JS: len(popParticipants): %d", len(popParticipants))
// second data structure because map iteration order is not guaranteed in GO
keys := make([]string, 0)
for p := range popParticipants {
@@ -84,6 +87,8 @@ func (k msgServer) resolveStagedClaims(ctx sdk.Context, start int64, end int64)
// convert per account
func (k msgServer) convertAccountClaim(ctx sdk.Context, participant string, amount uint64) (err error) {
+ util.GetAppLogger().Info(ctx, "JS: participant: %v", participant)
+
accAddr, err := sdk.AccAddressFromBech32(participant)
if err != nil {
return err
@@ -91,8 +96,13 @@ func (k msgServer) convertAccountClaim(ctx sdk.Context, participant string, amou
accStagedClaim := k.bankKeeper.GetBalance(ctx, accAddr, k.GetParams(ctx).StagedDenom)
+ util.GetAppLogger().Info(ctx, "JS: accStagedClaim.Amount: %v, amount: %v", accStagedClaim.Amount, sdk.NewIntFromUint64(amount))
+
if accStagedClaim.Amount.GTE(sdk.NewIntFromUint64(amount)) {
burnCoins, mintCoins := k.getConvertCoins(ctx, amount)
+
+ util.GetAppLogger().Info(ctx, "JS: burnCoins: %v, mintCoins: %v", burnCoins, mintCoins)
+
err = k.bankKeeper.SendCoinsFromAccountToModule(ctx, accAddr, types.ModuleName, burnCoins)
if err != nil {
return err
@@ -108,6 +118,7 @@ func (k msgServer) convertAccountClaim(ctx sdk.Context, participant string, amou
return err
}
}
+ util.GetAppLogger().Info(ctx, "JS: convertAccountClaim successful")
return
}
So the conclusion is:
@jmastr thank you for the research. I think the PoPs are primarily failing when interacting with Tasmota Devices. We could research that and make a decision to bann them from the network.
@LaurentMontBlanc @eckelj one thing that i noticed that might be helpful: I connected one ShellyPlug S to my EA via MQTT and it runs Tasmota software (it was already flashed, I just took it as is from the office). I successfully receive STAGEDCRDDLs and eventually CRDDLs with this setup. Out of interest I resolved one of my CIDs and got this:
{ "cid": "bafkreib5nxwilwve5ddrbdzj4xihjj5mpqcwnxnnuphmhqraf7eoxltrle", "data": { "Tasmota": "{\"Time\":\"2024-11-05T07:31:59\",\"ENERGY\":{\"TotalStartTime\":\"2024-06-25T09:54:01\",\"Total\":41.546,\"Yesterday\":0.637,\"Today\":0.088,\"Period\": 0,\"Power\":13,\"ApparentPower\":32,\"ReactivePower\":30,\"Factor\":0.39,\"Voltage\":309,\"Current\":0.105}}" } }
we talked about Tasmotas maybe not able the receive (STAGED)CRDDLs the last couple of days and I thought this might be helpful information
@jmastr thank you for looking that up and verifying that. In general: the tasmota firmware works. I just suspect they might stop working on a longer period of operation. why? because the tasmota firmware disards older CIDs.
That means that they will not be able to run the PoP on older CIDs successfully.
we have to distinguish between:
That makes totally sense. Thanks for the detailed explanation!
Just stumbled upon a recent PoP result: https://explorer.rddl.io/planetmint/transactions/613176D63DCD9A86E399E7E87978773F4B054FDEB34B25F4B3BDFB8A996BAFA0
{
"@type": "/planetmintgo.dao.MsgReportPopResult",
"creator": "plmnt1ltvh6468d48zew9gu4njzzcjp557dxavevkhv5",
"challenge": {
"height": "3222285",
"success": true,
"finished": false,
"initiator": "2fb1c9bad331cf708c4f42906c761defeb12677c",
"challengee": "plmnt1z3f38elc3ycm54037csr2yfznv5pm5tlx6rw2z",
"challenger": "plmnt1ltvh6468d48zew9gu4njzzcjp557dxavevkhv5"
}
}
successful, but unfinished. does that make sense?
Not really, but we should add a sanity check if not already in place
the STAGEDCRDDL to CRDDL issue was fixed and here is a follow up for why PoPs are being unsuccessful: https://github.com/rddl-network/issues/issues/119
Tools or monitoring here needed
Tasks to be done