ipfs / go-ipld-eth

Plugin of the Go IPFS Client for Ethereum Blockchain IPLD objects
MIT License
45 stars 16 forks source link

Next steps #1

Open whyrusleeping opened 7 years ago

whyrusleeping commented 7 years ago

To get this integrated more officially into go-ipfs, we will need to first clean the code here up a little (make sure it complies with golint and vetting tools) and then make it confirm to the newer go-ipld-format plugin semantics (it needs a DecodeBlock method matching this: https://github.com/ipfs/go-ipld-format/blob/master/coding.go#L13) Then in an init function, it needs to register itself in the decoders map.

Then, any build of go-ipfs that imports this package will automatically be able to handle ethereum types.

There are also a few changes from https://github.com/ipfs/go-ipfs/compare/feat/zcash that we will need to get merged (primarily the changes to the ipfs dag put command that allows hex input).

And finally, this package doesnt implement handling for all the different ethereum object types. I only did block, transaction, and transaction trie parsing. Working on support for state trie processing will be nice.

cc @hermanjunge

whyrusleeping commented 7 years ago

also cc @kumavis

whyrusleeping commented 7 years ago

Once all the above is done, building ipfs with ethereum support would still require running a custom binary. I'm really wanting to add support for plugins to ipfs, and that would nicely solve the issue. You could just build this package as a plugin, put it in the ipfs/plugins directory, and bam! you have the ability to traverse ethereum dags

kumavis commented 7 years ago

yeah i agree plugins would be awesome - look forward to that

kumavis commented 7 years ago

I only did block, transaction, and transaction trie parsing. Working on support for state trie processing will be nice.

currently eth-ipfs bridge only serves blocks, transactions (no tries), state and storage tries, no tx receipts or no tx receipt tries. These are missing b/c parity does not index them by hash (tx receipt) or doesn't store them at all (tx trie, tx receipt tree).

ghost commented 7 years ago

Done

Next Steps

[0x91] eth-tx (local data only) [0x92] eth-tx-receipt (local data only) [0x93] eth-account-snapshot (local data only)

[0x94] eth-block-list (rlp array)

[0x95] eth-tx-trie (merkle trie)

[0x96] eth-tx-receipt-trie (merkle trie) leaves: links to eth-tx-receipt

[0x97] eth-state-trie (secure merkle trie) leaves: links to eth-account-snapshot

[0x98] eth-storage-trie (secure merkle trie) leaves: links to raw binary

ghost commented 7 years ago

Paging @Kubuxu @Stebalien as per @whyrusleeping advice.

ghost commented 7 years ago

And linking to our evil world domination plan repo https://github.com/MetaMask/eth-ipfs-browser-client/issues/1

u0f0vk4

Stebalien commented 7 years ago

Then in an init function, it needs to register itself in the decoders map.

FYI, we've stopped doing this. Instead, just call Register(codec, decoder) at some point before trying to decode an eth block (this makes it easier to register eth decoders from plugins).

ghost commented 7 years ago

@Stebalien It's already done by Why. See https://github.com/ipfs/go-ipld-eth/blob/master/plugin/eth.go#L36-L38

Stebalien commented 7 years ago

Ah, ok. Just wanted to make sure there wasn't out-of-date information floating around.

ghost commented 7 years ago

@Stebalien please, if you have some time, take a look at the attempt of documenting each public function PR, so golint stopped nagging me 😉. Most of the comments are ~stolen~ borrowed from go-ipld-format interface.

I am building from this first PR.

ghost commented 7 years ago

Moving forward with this. A big refactor was already done here in #5. Some time will be spent on working on an importer. We can use the material in the plugin's directory README to make a blog post in the future.

ghost commented 7 years ago

Got some interesting data on my first attempt to import

screen_shot_2017-08-07_at_6 31 26_am
The importing performance should improve with a truckload of cheap machines (or research `amazon's lambda`s maybe?) and a shared stack. Redis comes to my mind.

herman
[07:15] 
Finally. One answer to a question

[07:15] 
2017/08/07 07:11:10 From the stack: 0xc7041743ad5152d8d13815ca6be379ff3b4c994069cc419867ab0d890d460b5f
2017/08/07 07:11:10     z45oqTS7yKVxeLJE8H1Q5o8nTusiARceKKt7hMkbED8PDeaCHQ2
2017/08/07 07:11:10         This is a leaf
2017/08/07 07:11:10             Node imported. Count = 12352
2017/08/07 07:11:10 From the stack: 0x84269463e5e9ecf08491d8745b98cec308498076c2cacbbe1c6e7adbe5d00438
2017/08/07 07:11:10     z45oqTS3UJgmLqbXEdANJGbbHKTHJcdZhvhkkrsoD6XL2A4dftb
2017/08/07 07:11:10         Adding 0xef6d2178835239b85ea68f9b3c2201ee49daf3744ebeb48901cc9374d9b97b9d (idx: b) to the stack
2017/08/07 07:11:10         Adding 0x06214d858b09063e9efe886d4f634348a7845a729807472bec1dbb26c40ac136 (idx: 5) to the stack
2017/08/07 07:11:10             Node imported. Count = 12353
2017/08/07 07:11:10 From the stack: 0x06214d858b09063e9efe886d4f634348a7845a729807472bec1dbb26c40ac136
2017/08/07 07:11:10     z45oqTRtzNeddu43X6Xvt8SBFmtVxukPrPZeBe4tiGNZYCeHf7K
2017/08/07 07:11:11         This is a leaf
2017/08/07 07:11:11             Node imported. Count = 12354
2017/08/07 07:11:11 From the stack: 0xef6d2178835239b85ea68f9b3c2201ee49daf3744ebeb48901cc9374d9b97b9d
2017/08/07 07:11:11     z45oqTSAh4htdRWX3DNXP1Ze2sQJ55UrukYNYoSMVitNXeY4P9n
2017/08/07 07:11:11         This is a leaf
2017/08/07 07:11:11             Node imported. Count = 12355
2017/08/07 07:11:11 From the stack: 0x1a202509db353cf86ea03dc0a9864a2c40af91e8bd28c1dc8ac56818824ed638
2017/08/07 07:11:11     z45oqTRvLRn9vW9u7VeEE7rqtrr4jz8ks5zzDSggNkF8BGgW9My
2017/08/07 07:11:11         This is a leaf
2017/08/07 07:11:11             Node imported. Count = 12356
Stack Empty. We are done here :D
[07:15] 
Genesis Block has `8,892` accounts (see
https://github.com/ethereum/pyethsaletool/blob/master/genesis_block.json)

[07:16] 
And `12'356` state trie nodes (took from `06:27:09` to `07:11:11` to traverse them all. 
Tunneling to the source Me in Chile, `mantis` in `Azure East 2` ) (edited)

[07:17] 
Etherscan says that the latest block (`#4127835`) has `5,270,884`
https://etherscan.io/accounts

herman
[07:31] 
So, a näive download at this rate to the latest block should take `435` hours ->
https://www.wolframalpha.com/input/?i=(06:27:09+to+07:11:11)+*+(5270327%2F8892)

[07:34] 
Now.
1) We will do the retrieval from a local machine with respected to the parity server.
2) as you get more blocks, the odds of "repeating" trie nodes increase
(That's the whole point of using a state trie).
3) We have to figure out a way to parallelize this process
(as stated above, several machines or lambdas, plus a common stack in redis, for example). or 
4) We can do the initial job, with just plugin to an inactive levelDB for earlier blocks,
and then using the API for the latest blocks.

[07:35] 
Anyways. We will figure out something, as always. At last we have numbers to start with!
Kubuxu commented 7 years ago

There should also be a major perf improvement if you switch your IPFS node from flatfs to bager (still WIP) right now but what you can do is:

  1. disable DHT - --dht=none option for the daemon for initial add
  2. enable the flatfs NoSync option in the config
ghost commented 7 years ago

OK. #7 is the second (and hopefully last) heavy overhaul. Now we can talk about organic growth, continuous improvements and the such.

Current focus is making a fast and decent importer (https://github.com/hermanjunge/go-ipld-eth-import, to be someday gave away to ipfs) for the eth-state-trie elements.

TODO

ghost commented 7 years ago

This one PR moves the needle to the right.

Docs

We have a pretty decent doc to make a huge blog post on this! Pinging @whyrusleeping as you requested this.

TODO

Operational

Remaining codecs to implement

Ideas

ghost commented 7 years ago

This is a write up on IPFS/notes I made the other day.

ghost commented 7 years ago

@dryajov

Remaining codecs to implement

~### EVM Code codec~

~Following PRs should be approved to include 0x99 codec here. Please give them a close following, as they involve a practical discussion whether it makes sense to add a new codec, or if we stick to 0x55 (raw data), as the EVM code has no structure.~

~ https://github.com/multiformats/multicodec/pull/61~ ~ https://github.com/ipfs/go-cid/pull/37~

rmulhol commented 6 years ago

@hermanjunge How did you fetch the state trie rlp data that's in the test data directory? I see that you mentioned using the parity ipfs api - how did you determine the cid to pass in? Did you use this tool? If so, did you generate the eth-state-trie cid from a block hash, a state root hash, or something different?

ghost commented 6 years ago

Did you use this tool?

That's correct, https://github.com/kumavis/eth-ipld-cli

If so, did you generate the eth-state-trie cid from a block hash, a state root hash, or something different?

The root state trie can be obtained from the block header. Succesive trie hashes are obtained when you retrieve this first element from a database (i.e. The ipfs-parity API), and then continue traversing. To know the traversal path, you need to hash (keccak-256) the value of the ethereum address. There is section documenting this example of performing the former operation manually with the ipfs client and the plugin in this repository. You can even find code to create the hash in that section.

Hope this answers your question.

rmulhol commented 6 years ago

Thanks for the quick reply, @hermanjunge! That example is really helpful, and I'm super excited to see where this project goes/potentially contribute.

Quick follow up - the example works for fetching the state root of the genesis block (and for traversing to accounts from there). Do you know whether it's possible to perform similar operations on subsequent blocks?

For example, with the genesis block, I know that the cid for the header is z43AaGF73rnZ14vjAkMQ8xoNfBShmq8qaiqFuELAx1vxSTzfGY2 and the cid for the root is z45oqTS97WG4WsMjquajJ8PB9Ubt3ks7rGmo14P5XWjnPL7LHDM, and I can traverse downward to learn information about accounts from there.

However, for block 5,000,000, it appears that the cid for the header is z43AaGF1A8G45wosbcDDkCMWyNt5FfWc1UMM3EzrdS9ZTGN419B and the cid for the root is z45oqTS15RnXKjQMUS4gtmpJJzeuKeYLE2yw1pdi98NUxCH6YZi, but the parity ipfs api call of http://localhost:5001/api/v0/block/get?arg=z45oqTS15RnXKjQMUS4gtmpJJzeuKeYLE2yw1pdi98NUxCH6YZi yields an error of State root not found (at least for me). Any idea what might be happening here?

ghost commented 6 years ago

Is highly probable that your parity client has pruned that state from the database, or have not even obtained that element from its synchronization. You may want to try with a latter block and state trie.

ghost commented 6 years ago

I checked with my running server and failed for block 5,000,000. However, for a recent block (5,614,095), I got success. Here,

https://etherscan.io/block/5614095

gave me its hash 0x536c2a4cf78f03268dc7f2bac2e5ce541d13fad0179891c47cd6825cedcb5829

eth-ipld cid 0x536c2a4cf78f03268dc7f2bac2e5ce541d13fad0179891c47cd6825cedcb5829

# gives  "ethBlock": "z43AaGExLSyBxdzcVdwGtC4X3Ydf2ftKYzQsyr1W6MioA3cZT4c",

curl --output - http://localhost:5001/api/v0/block/get?arg=z43AaGExLSyBxdzcVdwGtC4X3Ydf2ftKYzQsyr1W6MioA3cZT4c | eth-ipld block

# and I am able to get the stateRoot
# "stateRoot": "0x9c3e5ae1dcdbfcde4d804a4e54e793c8ac6328151d2dcf95438df04d98fe9703",

# which I convert to cid
# eth-ipld cid 0x9c3e5ae1dcdbfcde4d804a4e54e793c8ac6328151d2dcf95438df04d98fe9703

# then

curl --output - http://localhost:5001/api/v0/block/get?arg=z45oqTS56MVrDkBQoQFs5mcxLHty2msTu2D3cJfdZfFsirxKnaN | eth-ipld rlp

# gives

[
  "0c416261069b8763ea27d6eafb97351e511c951ac8b6eeed5ecd02a59a85e080",
  "dd5f3eaef4a1aa058a7da097c28be246d812d0c921ee2eb4ced6a8088e34e723",
...
  "8771e32a2fabd77211b95bd12e1b67db6755601ae046da94a07dc983c7300bd4",
  ""
]
AFDudley commented 6 years ago

Hi Herman!

I was able to run: curl --output - http://localhost:5001/api/v0/block/get?arg=z45oqTS56MVrDkBQoQFs5mcxLHty2msTu2D3cJfdZfFsirxKnaN | eth-ipld node

and get this result:

{ "type": "branch", "children": { "0": "0c416261069b8763ea27d6eafb97351e511c951ac8b6eeed5ecd02a59a85e080", "1": "dd5f3eaef4a1aa058a7da097c28be246d812d0c921ee2eb4ced6a8088e34e723", "2": "ecab2131db994982a26b540241fc4e7710b2aa1301383794a2ba12ff3200d5f4", "3": "cc486e899be905efdfbea3cd6b66d16e06e6c71759859d957eab69979eff875f", "4": "2834b562daf7e045516c2c85bd60a42ff4bcbc729efb240a6245e99a2c126f5f", "5": "d1f0e775c71bc99cc1db69ac4275283693bf7d701b8ea7db02c72e0a46b97405", "6": "8c5b2b89a8eb9488507c057a43a068cbda6ec937ecca32b29823e78d67dbe977", "7": "57d2db06cb923f043d1e16b170cb14a35b2efc96f272fd29c39e546d44b881ac", "8": "ebbbb6b1320a5ff55b2d8c35e52b810cd1eb4f90f0ef8f87d6fbaf0580ad950e", "9": "e659edc1f12beb959cf1405d6a4d8e669c77b109cfe4f4a56c7b748094c878ac", "a": "9c2bea51084f610a779574f0cf23f4f8e406766fe1d845078ce95e370b06aa02", "b": "01d04ace33310b608c1c751b6775c1ab91041efab45a5a91c817c29165c50bee", "c": "cf7d5c9c8b86721a18700afef55d316aed43635d955a631f490729237a27f168", "d": "fcf1c49c0585961b3e03f0fcecdb3f2cc23f3a9c796973f7b27a160841937500", "e": "71338d7803e20cf47d411e9ba0a6594492d62c656c5a53b16f825b934c6bcdce", "f": "8771e32a2fabd77211b95bd12e1b67db6755601ae046da94a07dc983c7300bd4" }, "value": "0x" }

But when I ran it with eth-ipld block I get this error: Error: wrong number of fields in data at Object.exports.defineProperties (/usr/local/lib/node_modules/eth-ipld/node_modules/ethereumjs-util/dist/index.js:698:15) at new module.exports (/usr/local/lib/node_modules/eth-ipld/node_modules/ethereumjs-block/header.js:79:9) at getStdin.buffer.then (/usr/local/lib/node_modules/eth-ipld/commands/block.js:31:20) at process._tickCallback (internal/process/next_tick.js:109:7)

could you explain what's going on? Thanks!

ghost commented 6 years ago

eth-ipld block processes the RLP of a block header

AFDudley commented 6 years ago

Yes. your line (with the typo fixed): curl --output - http://localhost:5001/api/v0/block/get?arg=z45oqTS56MVrDkBQoQFs5mcxLHty2msTu2D3cJfdZfFsirxKnaN | eth-ipld block

returns: % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 532 0 532 0 0 6897 0 --:--:-- --:--:-- --:--:-- 7000 Error: wrong number of fields in data at Object.exports.defineProperties (/usr/local/lib/node_modules/eth-ipld/node_modules/ethereumjs-util/dist/index.js:698:15) at new module.exports (/usr/local/lib/node_modules/eth-ipld/node_modules/ethereumjs-block/header.js:79:9) at getStdin.buffer.then (/usr/local/lib/node_modules/eth-ipld/commands/block.js:31:20) at process._tickCallback (internal/process/next_tick.js:109:7)

Not: [ "0c416261069b8763ea27d6eafb97351e511c951ac8b6eeed5ecd02a59a85e080", "dd5f3eaef4a1aa058a7da097c28be246d812d0c921ee2eb4ced6a8088e34e723", ... "8771e32a2fabd77211b95bd12e1b67db6755601ae046da94a07dc983c7300bd4", "" ]

Are we using different versions of eth-ipld block ?

Thanks again.

ghost commented 6 years ago

You are right @AFDudley , I checked my ./bash_history. Bad copy-pasta. I meant eth-ipld rlp. Apologies. Typo corrected above.

AFDudley commented 6 years ago

Thanks, I was able to replicate that with a more recent block.