Serialization (Was: Do we have a consensus about the wallet migration strategy?)

DangerousFreedom1984 commented 1 year ago

Hello everyone,

I would like to ask you some questions:

1) How do you see the structure of a transaction when we look at a block explorer? Would you say that it would be like in the figure below?

where

2)

-In the Seraphis-0-0-16 draft page 15, you say:

Linking tags will have the form (k b,recipient /(k a,sender + k a,recipient )) ∗ J. Even if a transaction author sends many e-notes to the same spend key K s , and all of those e-notes are spent, the author cannot use linear algebra on the resulting linking tags to associate those linking tags with the e-notes they sent out. This avoids the ‘linearity’ problem for fixed-base-point linking tag constructions noted by the CryptoNote whitepaper [37], which breaks sender-receiver anonymity. [[[formalize better? proof?]]]

Seraphis depends a lot on division operations, up to now, we did not do it that frequently so are there any known issues or special care when massively dealing with these operations? Did you start formalizing this proof?

3) I would consider a very very simple wallet with the functions below:

Open/Save wallet file with private keys.
Scan the blockchain for e-notes (already done)
Show amounts
Make a transaction (already done)

At this point, it should not be too hard (in terms of creativity) to create this simple wallet that would read/write from the blockchain and have these basic functionalities (as you have already written 10k+ LOC with the core of it).

Although, we cannot read/write from the blockchain as we dont have the structure to save the data on it yet! Can we start working on a very very primitive blockchain (just a reference file or so) with the blocks containing the reference transactions so we have a standard to work on? Like scanning the txs, showing the amounts, adding new transactions to this file and so on?

4) Do we have a consensus about the desired approach ?

I'm in favor of this double wallet. We start creating a new wallet_seraphis (which would be the similar of wallet2) and we add the necessary functions (that koe is working on) to make a legacy/seraphis transaction on wallet2.

What do you think?

5) I think it is also important to work in parallel in implementing the necessary structures to store the data in an actual blockchain style so we can have a testnet as soon as possible. If we agree with that structure shown in figures 1 and 2, we have to basically modify the file so the DB schema would account for that, right?

UkoeHB commented 1 year ago

SpTxSupplementV1 also includes the enote ephemeral pubkeys.

Seraphis depends a lot on division operations, up to now, we did not do it that frequently so are there any known issues or special care when massively dealing with these operations? Did you start formalizing this proof?

Not that I know of, and no I haven't started. Up to this point I have just assumed that inversion works without issue. You could look at the Lelantus-Spark paper to see if they have any special remarks, since their linking tag structure is very similar.

I would consider a very very simple wallet with the functions below:

Keep in mind that all the stuff I have written is not persistent state - no saving to file, just pure RAM mockups.

Can we start working on a very very primitive blockchain (just a reference file or so) with the blocks containing the reference transactions so we have a standard to work on?

You could fairly easily adapt/upgrade the MockLedgerContext to do this.

I'm in favor of this double wallet. We start creating a new wallet_seraphis (which would be the similar of wallet2) and we add the necessary functions (that koe is working on) to make a legacy/seraphis transaction on wallet2.

A seraphis wallet should be able to exist completely independent of wallet2 (i.e. users shouldn't need wallet2 at all), since I have built up legacy balance recovery into the seraphis library. There may need to be a new release where tons of old code touching wallet2 is deleted. An intermediate step would be refactoring out parts of wallet2 that need to be reused in a seraphis wallet.

I think it is also important to work in parallel in implementing the necessary structures to store the data in an actual blockchain style so we can have a testnet as soon as possible.

I agree that blockchain/db stuff is largely independent of wallet stuff, so those tasks can be worked on in parallel.

rbrunner7 commented 1 year ago

About the first part of your issue, with the diagram about a possible structure of a Seraphis transaction as shown e.g. in a block explorer: As far as I understand, that's basically the question how to serialize Seraphis transactions to store them in places like the blockchain and wallet cache files, so I will comment on that:

Maybe you, @DangerousFreedom1984 , know all what I write in the following already, or maybe not, and other readers might not as well, so there:

I worked a fair bit with object serialization and deserialization as it's implemented in the existing Monero code base. E.g. for the MMS I defined the structure for a fourth wallet file, "walletname.mms" from the ground up, and I watched jberman over the shoulder when he extended the transaction serialization to include viewtags.

Thus I can say that the way it's usually done if you have data in RAM that you have to make permanent i.e. store into files so you can re-create it later is "going forward". I mean with this that first you don't care at all how things will look when they are stored; your starting point is all the classes that describe that the data you want to store, so no "going backwards". You add code to serialize and deserialize, and whatever kind of structure that will create in the files will be ok.

Let's get concrete to illustrate. @UkoeHB 's Seraphis library currently defines a SpTxSquashedV1 struct as the object to manage Seraphis transactions in RAM. (I removed things like comments and methods to simplify.)

struct SpTxSquashedV1 final
{
    std::vector<LegacyEnoteImageV2> m_legacy_input_images;
    std::vector<SpEnoteImageV1> m_sp_input_images;
    std::vector<SpEnoteV1> m_outputs;
    SpBalanceProofV1 m_balance_proof;
    std::vector<LegacyRingSignatureV3> m_legacy_ring_signatures;
    std::vector<SpImageProofV1> m_sp_image_proofs;
    std::vector<SpMembershipProofV1> m_sp_membership_proofs;
    SpTxSupplementV1 m_tx_supplement;
    DiscretizedFee m_tx_fee;
    SemanticRulesVersion m_tx_semantic_rules_version;
}

Le'ts assume that the order of the members of this structure makes some sense, so we will just use the same order to write them into the file. This is how that structure might look with the necessary code added to use the same epee binary serialization system that is used e.g. for the existing Monero transactions:

struct SpTxSquashedV1 final
{
    std::vector<LegacyEnoteImageV2> m_legacy_input_images;
    std::vector<SpEnoteImageV1> m_sp_input_images;
    std::vector<SpEnoteV1> m_outputs;
    SpBalanceProofV1 m_balance_proof;
    std::vector<LegacyRingSignatureV3> m_legacy_ring_signatures;
    std::vector<SpImageProofV1> m_sp_image_proofs;
    std::vector<SpMembershipProofV1> m_sp_membership_proofs;
    SpTxSupplementV1 m_tx_supplement;
    DiscretizedFee m_tx_fee;
    SemanticRulesVersion m_tx_semantic_rules_version;

    BEGIN_SERIALIZE_OBJECT()
      VERSION_FIELD(0)
      FIELD(m_legacy_input_images)
      FIELD(m_sp_input_images)
      FIELD(m_outputs)
      FIELD(m_balance_proof)
      FIELD(m_legacy_ring_signatures)
      FIELD(m_sp_image_proofs)
      FIELD(m_sp_membership_proofs)
      FIELD(m_tx_supplement)
      FIELD(m_tx_fee)
      FIELD(m_tx_semantic_rules_version)
    END_SERIALIZE()
}

(Really only for illustrations. There are other macros like e.g. VARINT_FIELD that we may have to use for some members, and maybe some things like the fee are only "cached" in RAM for easier handling and don't belong stored permanently into any file.)

Most of these members are itself complex objects with again a number of members. They will all get such code sections to serialize them as well, and the recursive execution of all this will then decide what gets written to file and "how it looks" there.

rbrunner7 commented 1 year ago

I opened a new issue #11 about question no 5, the question about a blockchain file that can hold Seraphis transactions.

UkoeHB commented 1 year ago

@rbrunner7 I'd like to actively discourage making changes to any of the existing files in src/seraphis. For serialization, it would be better to see separate ObjName_serializable objects, and separate functions serialize(ObjName, [serialization config], ObjName_serializable &out) deserialize(ObjName_serializable, [deserialization config], [values to cache], &ObjName_out). I realize an intermediate step like this adds some allocation overhead, but imo it is very worth it to maintain clean code at scale, especially for objects with custom serializations.

rbrunner7 commented 1 year ago

For serialization, it would be better to see separate ObjName_serializable objects

I certainly see the value of this idea. If only from a project-management point of view because tight coupling between Seraphis library development and wallet development could be avoided.

One thing that I don't see clearly yet however is how to "go down" with a reasonable effort with this approach. I mean, some of the objects owned by a SpTxSquashedV1 are itself quite complex objects, that in turn contain yet other objects that still are not yet "elementary" like single strings and ints.

I think that there is no easy way to visualize, but a SpTxSquashedV1 object must be a tree of a considerable size and complexity. Wouldn't even serializing a single tx lead to construction of possibly dozens of intermediate serialization objects for that "going down"?

A second question would be this: With standard epee based macros the generated code is able to both serialize and deserialize. Would that be achievable as well using your proposed separate serializing classes, or would the code basically have to double?

Anyway, I don't see yet clearly before "my inner eye" how deserialization would work in detail ...

rbrunner7 commented 1 year ago

Beside epee serialization there is a second system of serialization provided by the Boost library that is - or at least was - also used in the Monero codebase.

Quite some time ago a number of files, e.g. ".mms" files, were switched from using Boost serialization to use epee serialization, if I remember correctly because there had been some bugs and exploits with Boost serialization, and epee serialization was considered more robust and more trustworthy.

But, and that's why I mention this here, with Boost serialization you do not write the serialization code directly into the classes, but you write separate templated classes.

You can see this in my message_store.h file with declarations that support both systems. The Boost classes start here.

So maybe for Seraphis and the Seraphis wallet a re-evaluation of the pros and contras of those two serialization systems would be in order, where Boost would have a big "pro" on its side to seamlessly support your proposal to keep any serialization code out of the Seraphis library and the advantages that this can bring.

UkoeHB commented 1 year ago

Wouldn't even serializing a single tx lead to construction of possibly dozens of intermediate serialization objects for that "going down"?

It shouldn't be that bad...

With standard epee based macros the generated code is able to both serialize and deserialize. Would that be achievable as well using your proposed separate serializing classes, or would the code basically have to double?

The ObjName_serializable would be direct serialize/deserialize like normal. On top there would be serialize/deserialize glue code to translate from/into the actual objects. We are already doing that for e.g. Bulletproofs, except it is a huge pain to actually find the code that does that.

Since I already know how this should work, maybe I should just add it to my todo.

rbrunner7 commented 1 year ago

I just realized that my "meta transaction" approach to serialize and deserialize transactions of any kind (see issue #7) probably won't work if legacy transactions use the epee system and Seraphis transactions use Boost, which could develoop into a serious problem.

UkoeHB commented 1 year ago

I just realized that my "meta transaction" approach to serialize and deserialize transactions of any kind (see issue https://github.com/seraphis-migration/wallet3/issues/7) probably won't work if legacy transactions use the epee system and Seraphis transactions use Boost, which could develoop into a serious problem.

If epee serialization ever gets replaced (there are now three proposals: perfect-daemon's, vtnerd's, and jeffro256's), we will probably end up using that.

rbrunner7 commented 1 year ago

We are already doing that for e.g. Bulletproofs

If the linked code implements the special serialization classes, where are the "true" Bulletproof classes then?

rbrunner7 commented 1 year ago

If epee serialization ever gets replaced

I did not follow developments closely: Do those systems allow for the desired de-coupling of class declarations and serialization code?

And well, it really depends when any replacement will actually happen. Seems to me that serious implementation work on the Seraphis wallet could already start in 2 months or so, with considerable manpower, and ideally the serialization question would get decided early.

UkoeHB commented 1 year ago

If the linked code implements the special serialization classes, where are the "true" Bulletproof classes then?

This is an example of serialization spaghetti. The Bulletproof struct has C++ members for its C++ representation, and serialization members for its serialization representation, and those are not 1:1.

I did not follow developments closely: Do those systems allow for the desired de-coupling of class declarations and serialization code?

No idea, maybe @j-berman or @vtnerd can comment.

And well, it really depends when any replacement will actually happen. Seems to me that serious implementation work on the Seraphis wallet could already start in 2 months or so, with considerable manpower, and ideally the serialization question would get decided early.

I can do a proof-of-concept for serialization within 2 weeks (as soon as I get the demo for legacy input integration working - getting closer).

rbrunner7 commented 1 year ago

This is an example of serialization spaghetti.

I think you coined a nice new term here :)

I can do a proof-of-concept for serialization within 2 weeks

That's a very attractive offer; you are probably by far the best person to try to prove that this approach works, as we probably should hope, or to find out that it turns into spaghetti code as well, just in a different way - which would be unfortunate of course, a "back to the drawing board" moment so to say.

UkoeHB commented 1 year ago

If epee serialization ever gets replaced (there are now three proposals: perfect-daemon's, vtnerd's, and jeffro256's), we will probably end up using that.

Follow up for anyone reading: these proposed serialization updates are for the p2p network, not for the blockchain, so they aren't relevant here.

DangerousFreedom1984 commented 1 year ago

Thank you for the discussion. I would suggest the serialization work package to be part of the wallet development as we could replace the blockchain stuff by only a file (only for simplification) if we know the correct formats to read/write the bytes from the bc (file) initially.

rbrunner7 commented 1 year ago

Today I was investigating the subject of serialization a bit more.

I showed here earlier in this issue that in the simplest possible case you can get classes to serialize by adding some quite simple lines, starting with BEGIN_SERIALIZE_OBJECT(), then having FIELD for every class member, and ending with END_SERIALIZE().

I think that if (and that's probably a big "if") things could be arranged in a way that makes it possible that serialization code stays so simple, not much would speak against having it directly in the classes as shown, and thus as part of the Seraphis library, which would be a simple and easy configuration.

So I went and tried to find out what's up with all that terrible "spaghetti" serialization code for cryptonote::transaction, rctSigBase and so on. See in the code e.g. here or here. What is at the core of the complexity of that code, what are the reasons it's complex? And would there have been a better way to do it? E.g. would it have helped to extend the functionality of that binary_archive serialization mechanism?

My (somewhat naive) hope was that after reaching understanding I would indeed find out that there are better ways to do it, or at least ways to prevent code degeneration to such an extent, and keep everything so simple that we could afford to keep the code right in the Seraphis library after all.

Unfortunately, I did not get yet very far. Biggest problem are things I don't understand yet. There is for example PR #1067 from @moneromooo-monero titled rct: rework serialization to avoid storing vector sizes. It complicated things considerably, but sadly no idea what was the problem with storing vector sizes ...

If neither @DangerousFreedom1984 nor @j-berman want to pick up at this point and investigate further, I will probably wait how @UkoeHB 's approach looks that mirrors every relevant Seraphis library object with a serializable object. I confess I am quite sceptical that turns out pretty and performant, but let's see.

DangerousFreedom1984 commented 1 year ago

@rbrunner7 I didn't really understand your question. Are you asking about the file structure where the serialization will happen? Or are you asking about the tasks to do? I believe we could have a similar approach, as you stated, in the corresponding file in Seraphis. Maybe this one ? But we may want to discuss it in another thread.

rbrunner7 commented 1 year ago

@DangerousFreedom1984 , it should become clear if you read the whole discussion between me and @UkoeHB. Most important and consequential statement, as I see it, is this one: If you want to avoid any code regarding serialization in the Seraphis library, things could get tricky.

DangerousFreedom1984 commented 1 year ago

@rbrunner7 Thanks for pointing. Yeah, I agree with that approach from @UkoeHB and I believe that his answer was not to touch the actual files or at least not change them drastically. But I guess we can insert other files inside the src/seraphis folder to make the serialization and have the proper objects as suggested. But sorry, I might have not understood still.

rbrunner7 commented 1 year ago

@UkoeHB recently completed a demo / proof-of-concept for the serialization of Seraphis transactions. The principal commit where you can see how it works is this one.

I read the code which is quite simple, clean, not hard to understand, and indeed smaller and less complicated than I myself would have predicted when I first had the general idea explained: Don't make the seraphis_lib objects themselves serializable, but make a new, second set of objects which are, and provide ways to convert back and forth between those two "families" of objects.

Still, I am also more confused than ever about the motivation for this exercise: What does this want to achieve, at its core, on a fundamental level? Which "bad scenarios" does this protect the seraphis_lib against? Which potential problems does it want to prevent, now and in the future?

If you drill down and look beyond those two families of objects, the 50 or so methods needed to convert between their members, and all the other "infrastructure" and go to the place where the actual serialization happens, in the file serialization_demo_types.h, you see code parts like the following one:

BEGIN_SERIALIZE()
    FIELD(A)
    FIELD(A1)
    FIELD(B)
    FIELD(r1)
    FIELD(s1)
    FIELD(d1)
    FIELD(L)
    FIELD(R)
END_SERIALIZE()

And, well, this code looks like I claimed it would look: Trivial.

I am really looking forward to clarification and explanation from @UkoeHB because it must be that I overlook something important. Currently this all looks to me like one pretty elaborate exercise to keep absolutely trivial code parts like the one showed out of the core of the Seraphis library. That can't be all there is because if it was the whole exercise would be pretty stupid IMHO, and I am sure @UkoeHB does not propose stupid things.

UkoeHB commented 1 year ago

The goal is to make trivial what otherwise would not be. Take the deserialization of CLSAGs and bulletproofs as the canonical example of a mess.

All of that mess is, in seraphis_lib, is replaced with: 1) deserialize to serializable tx 2) convert serializable tx to tx 3) check the m_tx_semantic_rules_version field then call validate_tx_semantics(tx)

There is a fair bit of serialization/deserialization that is completely trivial and could be moved directly in-line, but I chose to implement a consistent and isolated interface with the goal of keeping serialization completely out of the C++ world. Doing this also makes designing the serialized blob more flexible, if desired (e.g. for prunable txs, which currently look like this).

rbrunner7 commented 1 year ago

I have a question that may clear things up for me: Do you think that 2 or 3 years after the initial Seraphis hardfork, with maybe 2 or 3 semantic tweaks in place, with pruning fully implemented, and maybe other things I can't come up with right now, the full serialization code will also look quite complicated? Maybe not a complete "mess" like the examples from the existing code that you linked, but still quite complex?

UkoeHB commented 1 year ago

Do you think that 2 or 3 years after the initial Seraphis hardfork, with maybe 2 or 3 semantic tweaks in place, with pruning fully implemented, and maybe other things I can't come up with right now, the full serialization code will also look quite complicated?

It should not be necessary to make the serialization more complex. More lines - yes, more complex - no.

1) semantic rule changes: use a version number that informs post-deserialization semantics checkers 2) structure changes: use new structs to represent new structures 3) overloading a space with multiple potential structures: use a variant of types

For example, for pruning txs one option might be (not actually sure how boost::variants are serialized or if that's supported right now):

/// serializable: prunable components of an SpTxSquashedV1
struct ser_SpTxSquashedV1_PRUNABLE final
{
    /// balance proof (balance proof and range proofs)
    ser_SpBalanceProofV1_PARTIAL m_balance_proof;
    /// ring signature proofs: membership and ownership/key-image-legitimacy for each legacy input
    std::vector<ser_LegacyRingSignatureV3_PARTIAL> m_legacy_ring_signatures;
    /// composition proofs: ownership/key-image-legitimacy for each seraphis input
    std::vector<ser_SpImageProofV1> m_sp_image_proofs;
    /// Grootle proofs on squashed enotes: membership for each seraphis input
    std::vector<ser_SpMembershipProofV1_PARTIAL> m_sp_membership_proofs;

    BEGIN_SERIALIZE()
        FIELD(m_balance_proof)
        FIELD(m_legacy_ring_signatures)
        FIELD(m_sp_image_proofs)
        FIELD(m_sp_membership_proofs)
    END_SERIALIZE()
};

/// serializable SpTxSquashedV1
struct ser_SpTxSquashedV1 final
{
    /// semantic rules version
    SpTxSquashedV1::SemanticRulesVersion m_tx_semantic_rules_version;

    /// legacy tx input images (spent legacy enotes)
    std::vector<ser_LegacyEnoteImageV2> m_legacy_input_images;
    /// seraphis tx input images (spent seraphis enotes)
    std::vector<ser_SpEnoteImageV1> m_sp_input_images;
    /// tx outputs (new enotes)
    std::vector<ser_SpEnoteV1> m_outputs;
    /// supplemental data for tx
    ser_SpTxSupplementV1 m_tx_supplement;
    /// the transaction fee (discretized representation)
    unsigned char m_tx_fee;

    /// prunable components
    boost::optional<ser_SpTxSquashedV1_PRUNABLE> m_prunable;

    BEGIN_SERIALIZE()
        VARINT_FIELD(m_tx_semantic_rules_version)
        FIELD(m_legacy_input_images)
        FIELD(m_sp_input_images)
        FIELD(m_outputs)
        FIELD(m_tx_supplement)
        VARINT_FIELD(m_tx_fee) static_assert(sizeof(m_tx_fee) == sizeof(DiscretizedFee), "");
        VARIANT_FIELD(m_prunable)
    END_SERIALIZE()
};

This solution would introduce a little bit of spaghetti: calls to is_pruned(ser_SpTxSquashedV1). There may be better options depending how you want the data flow around pruned txs to go.

rbrunner7 commented 1 year ago

This solution would introduce a little bit of spaghetti: calls to is_pruned(ser_SpTxSquashedV1)

Maybe all that is needed to avoid that is to include an is_pruned boolean into the serialized object, written into the stream early, so you can read it early later on and react accordingly?

And I am thankful for that pruning code sketch, because it shows how the introduction of pruning forces a split into two classes, and the "upper" class, the full serializable transaction, gets a sub-object. You probably would not want this to happen to the "real" transaction class, you want to keep that "flat".

UkoeHB commented 1 year ago

Maybe all that is needed to avoid that is to include an is_pruned boolean into the serialized object, written into the stream early, so you can read it early later on and react accordingly?

I suppose that's another option. Tbh I don't know much about how pruned txs are handled, so it's hard for me to come up with the best solution. All I can say is: there must be solutions that minimize spaghet!

seraphis-migration / wallet3

Serialization (Was: Do we have a consensus about the wallet migration strategy?) #10