Seeking (multiple) synthetic blockchain generation technique suggestions

Mitchellpkt commented 6 years ago

[Note: Naturally, any tentative ideas for output selection and churn methods will be tested against the actual blockchain prior to recommendation, to make sure that some denonymization anomalies don't pop up from weird quirks like the early XWallet triple outputs or the entity that was churning with a counterproductively high number of mix-ins.]

However...

For initial training, we want pure adversarial evolution between the generator and the discriminator, without the discriminator having a practical advantage from types of quirks named above. Thus, we'll want to generate one (or more, preferably) types of synthetic transaction trees to serve as the battlefield.

For each version being tested, we'll imposed different spending constraints to ensure good performance across a range of conditions (since proposed methods must protect both short-term high-pace traders and long-term holders, etc).

I hacked together a very rudimentary transaction tree available in this repository. The idea was mostly to test out the format; its initialization of coinbases en masse is not remotely realistic. I'm looking for suggestions on multiple other ways to generate initial transaction trees in realistic ways.

Why multiple methods instead of only the best?

With multiple synthetic blockchain styles¸we can see whether or not the initialization plays a large role in the outcome (this has huge implications for the transferability of the technique onto the real blockchain)
Consider the likely situation that synthetic blockchain generation technique impacts the early-round results (when both models perform poorly anyways) in a way that gets washed out as the evolutionary war moves onto mostly using outputs generated by the GAN. It will be useful to see how long this initial-state-dependence lasts? Whether this impacts 600 blocks or 60000 blocks (especially on the real blockchain) will be important for guiding practical decisions

For each method, a large number of trees will be generated (10^3 - 10^5? per method) and used as the GAN training grounds, to ensure sufficient sample size for understanding homogeneity and/or divergence of GAN results for each initialization method.

Comment below with ideas or make a pull request with code & output synthetic tree. I have some notions, but I want to see what people suggest before biasing with my limited thoughts.

Gingeropolous commented 6 years ago

I think its been mentioned elsewhere, but I think the idea of using the bitcoin blockchain as a ground truth, and then overlaying synthetic rings on top of that will make for the best synthetic blockchain. In this way, you capture real chain usage.

samleegithub commented 6 years ago

I agree with Gingeropolous. Using bitcoin's public blockchain, we can measure quantitative characteristics of real world transactions. (frequency, amount, etc.) WIth this information, we could create a model that synthetically creates transactions with similar characteristics.

Mitchellpkt commented 6 years ago

I like these ideas! We'll use a transparent (e.g. Bitcoin) blockchain as ground truth for the sender, then apply the Monero decoy selection algorithm to create rings representative of what the wallet would have chosen (if it was Monero).

Your comments made me realize something else:

It would be a questionable assumption to say that Monero ground truth spending is statistically similar to any particular (e.g. Bitcoin) currency's spend patterns. So results learned from the Bitcoin chain may or may not be representative/effective for Monero.

To address this, we could use these methods, applied to transparent blockchains from several different cryptocurrencies, and see how much the results/statistics/strategies change as a result.

If results are robust across our synthetic blockchains based on several different cryptocurrencies' patterns, then we can say that it is probably safe for Monero.

If the results are very different depending on which real blockchain we use to generate the synthetic blockchain, then it will be very important to note this and understand limits of representativeness.

insight-decentralized-consensus-lab / CryptoNote-Blockchain-GAN

Seeking (multiple) synthetic blockchain generation technique suggestions #1