planetmint / planetmint-go

Go implementation of Planetmint
2 stars 0 forks source link

Staged RDDL's are not converted to C RDDL's #459

Closed GR3G0RYO closed 1 day ago

GR3G0RYO commented 4 weeks ago

Tools or monitoring here needed

Tasks to be done

jmastr commented 3 weeks ago

not getting STAGEDCRDDL

For investigation I extracted all transactions from the bdjuno database and I take plmnt1wca0fwyu5nl2h4sf7svwp647est4kmf6h7tq8s to show my findings:

The address participated 303 times in PoP:

$ grep MsgInitPop transaction_db.csv | grep plmnt1wca0fwyu5nl2h4sf7svwp647est4kmf6h7tq8s | wc -l
303

19 times a PoP result was reported:

$ grep MsgReportPopResult transaction_db.csv | grep plmnt1wca0fwyu5nl2h4sf7svwp647est4kmf6h7tq8s | wc -l
19

19 times it was unsuccessful:

$ grep MsgReportPopResult transaction_db.csv | grep plmnt1wca0fwyu5nl2h4sf7svwp647est4kmf6h7tq8s | grep '""success"": false' | wc -l
19

(and 17 times it did not finish ""finished"": false)

jmastr commented 3 weeks ago

sending around STAGEDCRDDL

happens once a day. seems to work fine:

$ sudo journalctl -u planetmint-go.service |grep "staged claims successfully for provided PoP heights"
Oct 22 02:39:53 ip-10-22-22-254 planetmint-god[947243]: 2:39AM INF [app] staged claims successfully for provided PoP heights: 3168075 3171675 module=server
Oct 23 03:01:39 ip-10-22-22-254 planetmint-god[947243]: 3:01AM INF [app] staged claims successfully for provided PoP heights: 3171675 3175275 module=server
Oct 24 03:23:25 ip-10-22-22-254 planetmint-god[947243]: 3:23AM INF [app] staged claims successfully for provided PoP heights: 3175275 3178875 module=server
Oct 25 03:45:16 ip-10-22-22-254 planetmint-god[947243]: 3:45AM INF [app] staged claims successfully for provided PoP heights: 3178875 3182475 module=server
Oct 26 04:07:17 ip-10-22-22-254 planetmint-god[947243]: 4:07AM INF [app] staged claims successfully for provided PoP heights: 3182475 3186075 module=server
Oct 27 04:29:06 ip-10-22-22-254 planetmint-god[947243]: 4:29AM INF [app] staged claims successfully for provided PoP heights: 3186075 3189675 module=server
Oct 28 04:50:51 ip-10-22-22-254 planetmint-god[947243]: 4:50AM INF [app] staged claims successfully for provided PoP heights: 3189675 3193275 module=server
Oct 29 05:13:44 ip-10-22-22-254 planetmint-god[425]: 5:13AM INF [app] staged claims successfully for provided PoP heights: 3193275 3196875 module=server
jmastr commented 3 weeks ago

This is the code handling burning STAGEDCRDDLs and minting CRDDLs: https://github.com/planetmint/planetmint-go/blob/v0.11.0/x/dao/keeper/msg_server_distribution_result.go#L40-L127

It does not produce any errors, otherwise we would see:

util.GetAppLogger().Error(ctx, "%s for provided PoP heights: %d %d", types.ErrResolvingStagedClaims.Error(), distribution.FirstPop, distribution.LastPop)

in our logging.

jmastr commented 3 weeks ago

I created a custom planetmint-god binary with more logging and deployed it to api.plmnt.r3c.network. I wait for the distribution to run tonight and tomorrow i am gonna check the logs...

jmastr commented 3 weeks ago

Just a thought: It might be connected to the down time between 18th and 21st of October (no space left of validators). My gut feeling says that it stopped working around that time...

jmastr commented 3 weeks ago

maybe related (reported by logcheck):

Oct 29 15:39:38 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:38.932170599Z caller=periodic_tasks.go:78 level=error msg="error while minting 1094946 tokens (tx id df73c93b8493d40fc77242f2df27dcb5f5bb0ad95c7cd3e4a8a5c3c1d741d785) for address plmnt12t05rud2q3n9plscehf3s0yyan42pp8asf9wu4"
Oct 29 15:39:38 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:38.932218246Z caller=backend.go:98 level=error error="Failed to convert entry: lq1qqvwwg28rpygn9skfkrhd6vu9gh5qqfclut4uz894auz6lj9t2lnl04lm0gyw8yg0dnn6xnl47zmexpthp34h5s2zk828xt2ys - error while minting 1094946 tokens (tx id df73c93b8493d40fc77242f2df27dcb5f5bb0ad95c7cd3e4a8a5c3c1d741d785) for address plmnt12t05rud2q3n9plscehf3s0yyan42pp8asf9wu4"
Oct 29 15:39:38 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:38.98306032Z caller=periodic_tasks.go:78 level=error msg="error while minting 1094946 tokens (tx id 6dbdcd718ca940aba37d89ee4886eb597f0468e5b97f88c1cb114729278815a1) for address plmnt128pw4xqn8hnagt98m0vllrxrlc8cvqduvqh2ju"
Oct 29 15:39:38 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:38.983107383Z caller=backend.go:98 level=error error="Failed to convert entry: lq1qqtqnldxpsvywc7wc4d0qv4230zv6wefhkzrs8w7usjz0f2wl29p2sdpes03k5wrr7wgtx3equefquhyltfwdkx2qwakl2xesh - error while minting 1094946 tokens (tx id 6dbdcd718ca940aba37d89ee4886eb597f0468e5b97f88c1cb114729278815a1) for address plmnt128pw4xqn8hnagt98m0vllrxrlc8cvqduvqh2ju"
Oct 29 15:39:39 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:39.032525462Z caller=periodic_tasks.go:78 level=error msg="error while minting 1094946 tokens (tx id dd69c6bcbcbf0196d53bb012eb2f6e0c924752185bd806dd74289d9033c52c71) for address plmnt1e33kn4u8yfktrp6gd4eq4xq5gqfcdm5fkyp73w"
Oct 29 15:39:39 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:39.032638725Z caller=backend.go:98 level=error error="Failed to convert entry: lq1qqdefv0u4rt3j4s2taf3fk7yurhjkktsyn7czsragc8mu5x035hsjnkd495m0h7zhh8v4up5k0yhcaz6qnu6y2mt3tw8xaunf3 - error while minting 1094946 tokens (tx id dd69c6bcbcbf0196d53bb012eb2f6e0c924752185bd806dd74289d9033c52c71) for address plmnt1e33kn4u8yfktrp6gd4eq4xq5gqfcdm5fkyp73w"
Oct 29 15:39:39 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:39.070370778Z caller=periodic_tasks.go:78 level=error msg="error while minting 1094946 tokens (tx id ceef466cb47df7b95e28c6dae9bcc604202f05954e82c445cd7bd8b335c90dab) for address plmnt1p498zlxe4yhz5wqllgvk22ucnezgje5f9zexwz"
Oct 29 15:39:39 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:39.070413349Z caller=backend.go:98 level=error error="Failed to convert entry: lq1qq2zyum2hfjrsj3uzggl33znr9ydgga5jjjuczv6jyz0zc96833nl9mjqpme7zwl75p2qcwkf607uwymj69yusactjvwhn87zk - error while minting 1094946 tokens (tx id ceef466cb47df7b95e28c6dae9bcc604202f05954e82c445cd7bd8b335c90dab) for address plmnt1p498zlxe4yhz5wqllgvk22ucnezgje5f9zexwz"
jmastr commented 3 weeks ago

here is the custom log I took with the custom binary mentioned above: js_distribution.log

This is the corresponding diff:

diff --git a/x/dao/keeper/msg_server_distribution_result.go b/x/dao/keeper/msg_server_distribution_result.go
index e82e392..d7036c6 100644
--- a/x/dao/keeper/msg_server_distribution_result.go
+++ b/x/dao/keeper/msg_server_distribution_result.go
@@ -43,10 +43,12 @@ func (k msgServer) resolveStagedClaims(ctx sdk.Context, start int64, end int64)
        if err != nil {
                return err
        }
+       util.GetAppLogger().Info(ctx, "JS: len(challenges): %d", len(challenges))

        popParticipants := make(map[string]uint64)

        for _, challenge := range challenges {
+               util.GetAppLogger().Info(ctx, "JS: challenge: %v", challenge)
                // if challenge not finished nobody has claims
                if !challenge.GetFinished() {
                        continue
@@ -67,6 +69,7 @@ func (k msgServer) resolveStagedClaims(ctx sdk.Context, start int64, end int64)
                popParticipants[initiatorAddr.String()] += validatorPopReward
        }

+       util.GetAppLogger().Info(ctx, "JS: len(popParticipants): %d", len(popParticipants))
        // second data structure because map iteration order is not guaranteed in GO
        keys := make([]string, 0)
        for p := range popParticipants {
@@ -88,9 +91,12 @@ func (k msgServer) convertAccountClaim(ctx sdk.Context, participant string, amou
        if err != nil {
                return err
        }
+       util.GetAppLogger().Info(ctx, "JS: accAddr: %v", accAddr)

        accStagedClaim := k.bankKeeper.GetBalance(ctx, accAddr, k.GetParams(ctx).StagedDenom)

+       util.GetAppLogger().Info(ctx, "JS: accStagedClaim: %v", accStagedClaim)
+
        if accStagedClaim.Amount.GTE(sdk.NewIntFromUint64(amount)) {
                burnCoins, mintCoins := k.getConvertCoins(ctx, amount)
                err = k.bankKeeper.SendCoinsFromAccountToModule(ctx, accAddr, types.ModuleName, burnCoins)
@@ -108,6 +114,7 @@ func (k msgServer) convertAccountClaim(ctx sdk.Context, participant string, amou
                        return err
                }
        }
+       util.GetAppLogger().Info(ctx, "JS: convertAccountClaim successful")

        return
 }
jmastr commented 3 weeks ago

Unfortunately I did not log the most crucial part:

if accStagedClaim.Amount.GTE(sdk.NewIntFromUint64(amount)) {
...
}
jmastr commented 3 weeks ago

I observe two things:

  1. most of the PoP challenges are unsuccessful or unfinished (685 of 721 unsuccessful and 525 of 721 unfinished. there is an overlap of course. fyi: if unfinished no one gets rewards, if finished but unsuccessful at least the initiator gets rewards)
  2. people have stagedcrddl in their accounts and they don't seem to be converted to crddl anymore (here i don't know why that is. was it the outage? is one of our supporting services misbehaving?)
jmastr commented 3 weeks ago

maybe related (reported by logcheck):


Oct 29 15:39:38 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:38.932170599Z caller=periodic_tasks.go:78 level=error msg="error while minting 1094946 tokens (tx id df73c93b8493d40fc77242f2df27dcb5f5bb0ad95c7cd3e4a8a5c3c1d741d785) for address plmnt12t05rud2q3n9plscehf3s0yyan42pp8asf9wu4"
Oct 29 15:39:38 ip-10-22-26-19 rddl-2-plmnt[398]: ts=2024-10-29T15:39:38.932218246Z caller=backend.go:98 level=error error="Failed to convert entry: lq1qqvwwg28rpygn9skfkrhd6vu9gh5qqfclut4uz894auz6lj9t2lnl04lm0gyw8yg0dnn6xnl47zmexpthp34h5s2zk828xt2ys - error while minting 1094946 tokens (tx id df73c93b8493d40fc77242f2df27dcb5f5bb0ad95c7cd3e4a8a5c3c1d741d785) for address plmnt12t05rud2q3n9plscehf3s0yyan42pp8asf9wu4"
...

unfortunately we are not logging the actual error message returned by r2p.pmClient.MintPLMNT containing the details about what is wrong...

jmastr commented 3 weeks ago

Side note: During my investigation i often stumble over the fact, that our error logging is not ideal. we log some error, but the actual interesting message we do not log. We might to do some refactoring/coming up with a error logging/handling coding guide line.

jmastr commented 3 weeks ago

I like this article very much (it is in German though) https://blog.alexandergunkel.eu/error-handling-in-golang/

Each error is handled in exactly one place ... Such an approach is often easy to implement and makes no demands on the architecture of the application, but has the disadvantage of greater complexity, which often means that error handling is duplicated or unintentionally omitted. At this point, it is to be feared that another log message will be written at a higher level, leading to redundancies and log spam. It is therefore better to find a single place where the error is addressed and then not to pass the error on. It would be better to pass the error on to the calling function:

 ...
 if err := fnt(param); err != nil {
    return fmt.Errorf("failed doing XYZ with param %s: %w", param, err)
 }
 ...

It should be noted that the enrichment of the error with information does not count as error handling, of course, nor does the release of resources that have been allocated within the function - for which it is better to use a defer.

jmastr commented 3 weeks ago

Is it correct, that we challenge every 8 minutes (20 * 24s/60s)?

...
Oct 30 10:59:18 ip-10-22-26-19 planetmint-god[426]: 10:59AM INF [app] MQTT message successfully sent: initiator:"1adb868b2b9623510dbe3d7419e4d2b5e712e08d" challenger:"plmnt1z3f38elc3ycm54037csr2yfznv5pm5tlx6rw2z" challengee:"plmnt1fxxw0xkmwhanm6yhhq0tqa0fe388g095cc53ym" height:3201280  module=server
Oct 30 11:07:24 ip-10-22-26-19 planetmint-god[426]: 11:07AM INF [app] MQTT message successfully sent: initiator:"1adb868b2b9623510dbe3d7419e4d2b5e712e08d" challenger:"plmnt14uv2pfvyy4jm7f9unfaha7kd3lnt07lv7gr5t9" challengee:"plmnt1jpgdg9mzh9f6cuku47ugqrgr3tnmc09u9zlwz3" height:3201300  module=server
jmastr commented 3 weeks ago

PoPs being unsuccessful quite often is nothing new. Going through all transactions: 37976 of 48511 were unsuccessful, but more were finished then above: 36638 of 48511.

eckelj commented 3 weeks ago

@jmastr Currently 4 validators are orchestrating PoP. A PoP occurs every 2 min. so each validator processes a pop every 8 minutes.

jmastr commented 3 weeks ago

@jmastr Currently 4 validators are orchestrating PoP. A PoP occurs every 2 min. so each validator processes a pop every 8 minutes.

That makes sense. Thanks!

jmastr commented 3 weeks ago

Here is a better log: js_distribution.log

Generated with this diff:

diff --git a/x/dao/keeper/msg_server_distribution_result.go b/x/dao/keeper/msg_server_distribution_result.go
index e82e392..40ce5fe 100644
--- a/x/dao/keeper/msg_server_distribution_result.go
+++ b/x/dao/keeper/msg_server_distribution_result.go
@@ -43,10 +43,12 @@ func (k msgServer) resolveStagedClaims(ctx sdk.Context, start int64, end int64)
        if err != nil {
                return err
        }
+       util.GetAppLogger().Info(ctx, "JS: len(challenges): %d", len(challenges))

        popParticipants := make(map[string]uint64)

        for _, challenge := range challenges {
+               util.GetAppLogger().Info(ctx, "JS: challenge: %v", challenge)
                // if challenge not finished nobody has claims
                if !challenge.GetFinished() {
                        continue
@@ -67,6 +69,7 @@ func (k msgServer) resolveStagedClaims(ctx sdk.Context, start int64, end int64)
                popParticipants[initiatorAddr.String()] += validatorPopReward
        }

+       util.GetAppLogger().Info(ctx, "JS: len(popParticipants): %d", len(popParticipants))
        // second data structure because map iteration order is not guaranteed in GO
        keys := make([]string, 0)
        for p := range popParticipants {
@@ -84,6 +87,8 @@ func (k msgServer) resolveStagedClaims(ctx sdk.Context, start int64, end int64)

 // convert per account
 func (k msgServer) convertAccountClaim(ctx sdk.Context, participant string, amount uint64) (err error) {
+       util.GetAppLogger().Info(ctx, "JS: participant: %v", participant)
+
        accAddr, err := sdk.AccAddressFromBech32(participant)
        if err != nil {
                return err
@@ -91,8 +96,13 @@ func (k msgServer) convertAccountClaim(ctx sdk.Context, participant string, amou

        accStagedClaim := k.bankKeeper.GetBalance(ctx, accAddr, k.GetParams(ctx).StagedDenom)

+       util.GetAppLogger().Info(ctx, "JS: accStagedClaim.Amount: %v, amount: %v", accStagedClaim.Amount, sdk.NewIntFromUint64(amount))
+
        if accStagedClaim.Amount.GTE(sdk.NewIntFromUint64(amount)) {
                burnCoins, mintCoins := k.getConvertCoins(ctx, amount)
+
+               util.GetAppLogger().Info(ctx, "JS: burnCoins: %v, mintCoins: %v", burnCoins, mintCoins)
+
                err = k.bankKeeper.SendCoinsFromAccountToModule(ctx, accAddr, types.ModuleName, burnCoins)
                if err != nil {
                        return err
@@ -108,6 +118,7 @@ func (k msgServer) convertAccountClaim(ctx sdk.Context, participant string, amou
                        return err
                }
        }
+       util.GetAppLogger().Info(ctx, "JS: convertAccountClaim successful")

        return
 }
jmastr commented 3 weeks ago

So the conclusion is:

eckelj commented 3 weeks ago

@jmastr thank you for the research. I think the PoPs are primarily failing when interacting with Tasmota Devices. We could research that and make a decision to bann them from the network.

jmastr commented 2 weeks ago

@LaurentMontBlanc @eckelj one thing that i noticed that might be helpful: I connected one ShellyPlug S to my EA via MQTT and it runs Tasmota software (it was already flashed, I just took it as is from the office). I successfully receive STAGEDCRDDLs and eventually CRDDLs with this setup. Out of interest I resolved one of my CIDs and got this:

{ "cid": "bafkreib5nxwilwve5ddrbdzj4xihjj5mpqcwnxnnuphmhqraf7eoxltrle", "data": { "Tasmota": "{\"Time\":\"2024-11-05T07:31:59\",\"ENERGY\":{\"TotalStartTime\":\"2024-06-25T09:54:01\",\"Total\":41.546,\"Yesterday\":0.637,\"Today\":0.088,\"Period\": 0,\"Power\":13,\"ApparentPower\":32,\"ReactivePower\":30,\"Factor\":0.39,\"Voltage\":309,\"Current\":0.105}}" } }

we talked about Tasmotas maybe not able the receive (STAGED)CRDDLs the last couple of days and I thought this might be helpful information

eckelj commented 2 weeks ago

@jmastr thank you for looking that up and verifying that. In general: the tasmota firmware works. I just suspect they might stop working on a longer period of operation. why? because the tasmota firmware disards older CIDs.

That means that they will not be able to run the PoP on older CIDs successfully.

we have to distinguish between:

jmastr commented 2 weeks ago

That makes totally sense. Thanks for the detailed explanation!

jmastr commented 2 weeks ago

Just stumbled upon a recent PoP result: https://explorer.rddl.io/planetmint/transactions/613176D63DCD9A86E399E7E87978773F4B054FDEB34B25F4B3BDFB8A996BAFA0


{
    "@type": "/planetmintgo.dao.MsgReportPopResult",
    "creator": "plmnt1ltvh6468d48zew9gu4njzzcjp557dxavevkhv5",
    "challenge": {
        "height": "3222285",
        "success": true,
        "finished": false,
        "initiator": "2fb1c9bad331cf708c4f42906c761defeb12677c",
        "challengee": "plmnt1z3f38elc3ycm54037csr2yfznv5pm5tlx6rw2z",
        "challenger": "plmnt1ltvh6468d48zew9gu4njzzcjp557dxavevkhv5"
    }
}

successful, but unfinished. does that make sense?

LaurentMontBlanc commented 2 weeks ago

Not really, but we should add a sanity check if not already in place

jmastr commented 1 day ago

the STAGEDCRDDL to CRDDL issue was fixed and here is a follow up for why PoPs are being unsuccessful: https://github.com/rddl-network/issues/issues/119