manly / BlockChainParser

C# BlockChain parser
11 stars 4 forks source link

Missing loop in Block.cs #1

Open pekatete opened 7 years ago

pekatete commented 7 years ago

Hi thanks for this project, just a few questions.

  1. Are you still maintaining this project?
  2. In the Blocks.cs file, there is a missing loop, it therefore always returns one block (whereas there can be / are multiple blocks in a blk*.dat file).
  3. Again with the Blocks.cs, the return value is a Block() whereas the calling ParseAll returns an IEnumerable(Of Block), thus the blocks read should be added to an IEnumerable (list?) before returning at the end of the (missing) loop mentioned in 2 above.

I've converted your code to VB and it works very well with the above additions but still have to polish off some things, like getting block height and identifying orphaned blocks.

manly commented 7 years ago

Well, I wrote this code back when the Wikileaks folks believed Julian Assange might have been MIA. There was belief that if he were to be gone, he would probably send data on the blockchain since it's censor proof and temper proof. No worries, I'm not trying to get you into conspiracy theories; I'm just giving some context. The efforts I saw to this end were just - let's just say god awful. I decided I might as well donate my time and knowledge and write a blockchain parser whose main goal was to be as easy to read as possible (no external dependency, no fancy tricks, etc). The reason was that I wanted anyone to be able to ascertain the code did no foul. You can imagine it's a big deal when the people using this are big into conspiracy theory. Hence why the nature of the code is rather simplistic. It just does what it's meant to do.

As far as the missing loop is concerned, I have used this code to fully parse the blockchain. I would suspect is in your rewrite of it. If I were to guess you might have misinterpreted how "yield returns" work. Or did not request to read past the first item from the returned IEnumerable.

I have stopped maintaining the code since well it's been plenty obvious that Julian Assange is alive and well. I might make it more concise in the future since I do plan to potentially write a few reading bots for fun and could reuse that code. If I do, I'll update the code and remove the old comments that stemmed from my back then lack of deeper knowledge of how bitcoin works.

In any case I am glad if my efforts of old prove helpful to you.

Sent from my iPhone

On Aug 12, 2017, at 8:48 AM, pekatete notifications@github.com wrote:

Hi thanks for this library, just a few questions.

Are you still maintaining this library? In the Blocks.cs file, there is a missing loop, it therefore always returns one block (whereas there can be / are multiple blocks in a blk*.dat file). Again with the Blocks.cs, the return value is a Block() whereas the calling ParseAll returns an IEnumerable(Of Block), thus the blocks read should be added to an IEnumerable (list?) before returning at the end of the (missing) loop mentioned in 2 above. I've converted your code to VB and it works very well with the above additions but still have to polish off some things, like getting block height and identifying orphaned blocks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

pekatete commented 7 years ago

Your guess is spot on, it was during the conversion that I lost the Yield, I added some more logic to my code that optimises reading the block-chain files ahead thus the loop in the Parse function.

Any insight into getting block height and identifying orphaned blocks?

Anyway, good project that I am sure has / will help many not into C++ et al, it's actually inspired me to try writing my own SPV wallet ....

manly commented 7 years ago

Block heights. Well, here's the rub about this. There is no stored block height in the blocks themselves. You need to follow up the chain starting from genesis and count manually. Keep in mind that the block files do not store contiguous blocks (since they are downloaded asynchroneously). The result of this is that you might see blocks 4,0,3,2,1. So to do this properly you need to keep a cache of the unresolved block height blocks as you parse the blocks, and those block heights aren't necessarily set immediately.

You might want to take the opportunity to write down the key value pairs (block hash, block heights) on a file somewhere if those are something your code needs.

If I am not wrong, I believe they passed a BIP to include the block height as part of the first transaction of the block (the one that gives the mined bitcoins - the coinbase? I think it's called). Needless to say, this is a hack somewhat and since it's a BIP (bitcoin improvement protocol), it was added on later, meaning that not all blocks will have that data, and neither will you have a warranty that every block after the one where the BIP was accepted will contain the block height.

On Aug 13, 2017, at 5:53 AM, pekatete notifications@github.com wrote:

Your guess is spot on, it was during the conversion that I lost the Yield, I added some more logic to my code that optimises reading the block-chain files ahead thus the loop in the Parse function.

Any insight into getting block height and identifying orphaned blocks?

Anyway, good project that I am sure has / will help many not into C++ et al, it's actually inspired me to try writing my own SPV wallet ....

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

pekatete commented 7 years ago

Thanks for the heads-up, FYI it is BIP34 - https://github.com/bitcoin/bips/blob/master/bip-0034.mediawiki Should be trivial to decode that, and it does seem all v2 blocks must have it.

The format of the height is "serialized CScript" -- first byte is number of bytes in the number (will be 0x03 on main net for the next 150 or so years with 223-1 blocks), following bytes are little-endian representation of the number (including a sign bit). Height is the height of the mined block in the block chain, where the genesis block is height zero (0).

manly commented 7 years ago

There you have it. Unfortunately you do have to follow it up since genesis like I pointed out to do it properly for those lacking that info. You could do some assumption and assume the blockchain will not get reversed all the way back to v1 blocks and pre-parse all blocks and Store the key value pairs I pointed out earlier.

And if you want to be extra secure, you probably want to store the key value pairs using the block hash + another hash algorithm. Why? Well in the highly unlikely case that SHA256*2 has a weakness and that people rewrite blocks, then smart people reversing blocks will probably use that very weakness to make sure the new blocks still match the old hashes.

Furthermore, although this is somewhat very very speculative on my part, it is safer to rely on the calculated block height than it is to rely on the "declared" block height. I am pretty sure there are no checks that nodes do on this value to make sure it is indeed incremental. It all boils down to your use case I suppose.

On Aug 13, 2017, at 6:42 PM, pekatete notifications@github.com wrote:

Thanks for the heads-up, FYI it is BIP34 - https://github.com/bitcoin/bips/blob/master/bip-0034.mediawiki Should be trivial to decode that, and it does seem all v2 blocks.

The format of the height is "serialized CScript" -- first byte is number of bytes in the number (will be 0x03 on main net for the next 150 or so years with 223-1 blocks), following bytes are little-endian representation of the number (including a sign bit). Height is the height of the mined block in the block chain, where the genesis block is height zero (0).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.