ivilata / pymultihash

Python implementation of the multihash specification
19 stars 8 forks source link

Doesn't compute the same hash as IPFS #3

Closed pjz closed 5 years ago

pjz commented 6 years ago

I'm trying to use this to predict the address of a file added to IPFS. I wrote a little test-wrapper:

import sys                                                                                                                                                          

import multihash                                                                                                                                                    
import base58                                                                                                                                                       

if __name__ == '__main__':                                                                                                                                          

    multihash.CodecReg.register('base58', base58.b58encode, base58.b58decode)                                                                                       

    filename = sys.argv[1]                                                                                                                                          

    with open(filename, 'rb') as infile:                                                                                                                            
        data = infile.read()                                                                                                                                        

    mh = multihash.digest(data, 'sha2-256')                                                                                                                         

    print(mh.encode('base58'))              

and then compared its output to that of ipfs add -n <file>... and they're not the same. What am I doing wrong?

My exact test was:

$ dd if=/dev/zero of=zeros bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00251314 s, 417 MB/s

$ ./venv/bin/python ipfs_hash.py zeroes 
QmRdTXKPV8VPCuPaawjJZZaACsDYRfVtZtNZTDLXrAQPx3

$ ipfs add -n zeroes 
added QmVkbauSDEaMP4Tkq6Epm9uW75mWm136n81YH8fGtfwdHU zeroes
pjz commented 6 years ago

I forgot about chunking. So the large file was doomed to start with. But a small file should be fine:

$ cat foo
foo
$ ipfs add -n foo 
added QmYNmQKp6SuaVrpgWRsPTgCQCnpxUYGq76YEKBXuj2N4H6 foo
$ ./venv/bin/python ipfs_hash.py foo 
Qmaa4Rw81a3a1VEx4LxB7HADUAXvZFhCoRdBzsMZyZmqHD
ivilata commented 5 years ago

Hi @pjz, thanks for reporting and sorry for the very late answer.

The hash that ipfs add reports is not that of the data but that of the block that wraps it:

$ echo foo > foo.txt
$ ipfs add foo.txt 
added QmYNmQKp6SuaVrpgWRsPTgCQCnpxUYGq76YEKBXuj2N4H6 foo.txt
$ ipfs block get QmYNmQKp6SuaVrpgWRsPTgCQCnpxUYGq76YEKBXuj2N4H6 | python3 -c 'import sys; print(repr(sys.stdin.buffer.read()))'
b'\n\n\x08\x02\x12\x04foo\n\x18\x04'
$ python3
>>> import multihash as mh
>>> mh.digest(b'\n\n\x08\x02\x12\x04foo\n\x18\x04', 'sha2-256').encode('base58')
b'QmYNmQKp6SuaVrpgWRsPTgCQCnpxUYGq76YEKBXuj2N4H6'

So it isn't trivial to get the hash that IPFS is showing up. Yeah, I was also bitten by this! :wink: