vocdoni / vocdoni-node

A set of libraries and tools for the Vocdoni decentralized backend infrastructure, the main ground of our universally verifiable, privacy-centric and scalable digital voting protocol
GNU Affero General Public License v3.0
86 stars 17 forks source link

bug: transaction information comes base64 encoded #1354

Closed elboletaire closed 1 month ago

elboletaire commented 3 months ago

Describe the bug The transaction information endpoint v2/chain/transactions/{blockHeight}/{transactionIndex} is returning most information base64 encoded.

Due to this base64 encoding, the explorer was trying to decode everything, and that was causing the error in vocdoni/explorer/issues/62

To Reproduce

curl -s https://api-stg.vocdoni.net/v2/chain/transactions/1176690/0 | jq

Calls where we found base64 data instead of base16 (expected):

Current behavior The returned json has many base64 values:

{
  "tx": {
    "setProcess": {
      "txtype": "SET_PROCESS_CENSUS",
      "nonce": 1,
      "processId": "sx3/YYFNq5DWw6m2XWyQgwSA5Lda0y50eUICAAAAAAA=",
      "censusRoot": "zUU9BcTLBCnuXuGu/tAW9VO4AmtM7VsMNSkFv6U8foE=",
      "censusURI": "ipfs://bafybeicyfukarcryrvy5oe37ligmxwf55sbfiojori4t25wencma4ymxfa",
      "censusSize": "1000"
    }
  },
  "txInfo": {
    "transactionNumber": 21778,
    "transactionHash": "5fe6780f87d24e990e653552f03861ba50d09857261299606ae0d272b171f843",
    "blockHeight": 1176690,
    "transactionIndex": 0,
    "transactionType": "setProcess"
  },
  "signature": "6ae51220aec68b503ff5dea3c2af6b375b8c6a41816c94c1bdae70617face7796a8fdbb78128279aa71bd892363e51883925595ee92f3977953a6bf44a4e8e671c"
}

Expected behavior To receive all these base64 values as base16

p4u commented 2 months ago

ping @mvdan

mvdan commented 2 months ago

Sorry for the slowness. I looked into this recently but got stuck; these are Go types generated as part of our protobuf models, so they hard-code the field types and we are not able to use our own HexBytes type which overrides the JSON marshaling from base64 to base16.

Copying the types to replace those field types would be a nightmare, since the tx type is a oneof, so we would need to copy over a dozen types.

Protobuf themselves do not provide any way to configure how the JSON marshaling happens, because the spec says that bytes MUST marshal as base64 in JSON: https://protobuf.dev/programming-guides/proto3/#json

Trying to do some patching via encoding/json and any types is going to be super unreliable, because it's basically impossible to tell for sure if a string like "deadbeef" is meant to be base64 or not. So that's not an option.

The only realistic solution I can think of would be to use Go reflection to recurse the models.Tx value, find all []byte fields, encode as both JSON base64 and base16, and after doing the protojson marshal, do a search-and-replace of all base64 strings from before with their base16 counterparts. This would be very hacky, but I think it would work.

The only alternative I can think of is to keep base64, and very clearly document that the tx field in this endpoint will contain the protojson encoding of the models protobuf types, and so the encoding will be slightly different than the rest of our API. For instance, with bytes being base64 rather than base16. This would save us from horrible hacks like the above in the backend, but I'm not sure if it would lead to other hacks in the frontend. I hope it would be easy enough for the frontend to remember that this endpoint uses base64 rather than base16.

In the meantime, here are two commits to clean up the code a bit and add a test: https://github.com/vocdoni/vocdoni-node/pull/1374

mvdan commented 2 months ago

A third option would be for the REST endpoint to provide a JSON field containing the original protobuf-encoded bytes, in binary format encoded as base64. Then it would be up to the frontend to use the protobuf library to decode those bytes into the models.Tx type. I think this would be clean for both the backend and frontend, although it would make the REST endpoint rather ugly, as we would end up with the tx field having a relatively long blob of base64.

p4u commented 2 months ago

ping @altergui

mvdan commented 1 month ago

Let me know what the resolution here is, I'm happy to assist on the Go side if any changes are needed.

altergui commented 1 month ago

The only alternative I can think of is to keep base64, and very clearly document that the tx field in this endpoint will contain the protojson encoding of the models protobuf types, and so the encoding will be slightly different than the rest of our API. For instance, with bytes being base64 rather than base16. This would save us from horrible hacks like the above in the backend, but I'm not sure if it would lead to other hacks in the frontend. I hope it would be easy enough for the frontend to remember that this endpoint uses base64 rather than base16.

that's not an option because it's basically what we've been doing until now, the problem is that not ALL fields are base64 encoded, some are simply integers (like 1000) that nonetheless can be parsed as base64, but they shouldn't. so it's not simply saying tx field has base64 encoding, but exactly specifying which fields are (processID and censusRoot in this case), for EACH transaction type. doable, but cumbersome and hard to maintain.

altergui commented 1 month ago

The only realistic solution I can think of would be to use Go reflection to recurse the models.Tx value, find all []byte fields, encode as both JSON base64 and base16, and after doing the protojson marshal, do a search-and-replace of all base64 strings from before with their base16 counterparts. This would be very hacky, but I think it would work.

that's clever! i tried using reflection to modify the types of the original object (before passing to protojson marshal) but that didn't go well. didn't think about leaving the object untouched, love it.

mvdan commented 1 month ago

Then I think the best remaining option is to do some reflect magic on the Go side to find all []byte values and re-encode them from base64 to hex.

that's not an option because it's basically what we've been doing until now, the problem is that not ALL fields are base64 encoded, some are simply integers (like 1000) that nonetheless can be parsed as base64, but they shouldn't. so it's not simply saying tx field has base64 encoding, but exactly specifying which fields are (processID and censusRoot in this case), for EACH transaction type. doable, but cumbersome and hard to maintain.

To be clear, that is a problem with either base64 or hex encoding. {"foo": "123"} can be a string, or base64-encoded bytes, or hex-encoded bytes, and any consumer of the JSON cannot tell them apart unless they know which fields are meant to be bytes. So I'm actually not sure why switching from base64 to hex helps here, aside from consistency with the other JSON endpoints.

p4u commented 1 month ago

ping @altergui @selankon

altergui commented 1 month ago

So I'm actually not sure why switching from base64 to hex helps here, aside from consistency with the other JSON endpoints.

precisely because of that consistency. yes, please do the reflect magic :pray:

fwiw, i agree "123" can be a string or hex-encoded bytes, you're correct. but explorer doesn't care, the hex-encoded bytes are useful, they can be treated as a string. problem is with base64, it's useless unless converted to base16, and that conversion is problematic.

mvdan commented 1 month ago

Done: https://github.com/vocdoni/vocdoni-node/pull/1393