ipfs / js-ipfs

IPFS implementation in JavaScript
https://js.ipfs.tech
Other
7.44k stars 1.25k forks source link

How to get/cat a hash when don't know the type of the hash? #1049

Closed mitra42 closed 1 year ago

mitra42 commented 6 years ago

The scanario is where our archive.org gateway is adding a file to IPFS and we want to retrieve it on a browser from its multihash. Its easy to do from the IPFS gateways but seems impossible in JS with the current APIs ?

Lets Take two cases ...

A - 10.1001/jama.2009.1064 paper about Alzheimers 262438 bytes and B: 10.1002/asjc.93 (paper about microscopes). 184324 bytes I'm guessing the sharding size is 250k, which accounts for the different behavior.

All have been submitted using the HTTP API and returned hashes A = Qmbzs7jhkBZuVixhnM3J3QhMrL6bcAoSYiRPZrdoX3DhzB B= QmTds3bVoiM9pzfNJX6vT2ohxnezKPdaGHLd4Ptc4ACMLa

lets fetch them locally: A: https://ipfs.dweb.me/ipfs/Qmbzs7jhkBZuVixhnM3J3QhMrL6bcAoSYiRPZrdoX3DhzB B: https://ipfs.dweb.me/ipfs/QmTds3bVoiM9pzfNJX6vT2ohxnezKPdaGHLd4Ptc4ACMLa All good

or via ipfs.io A: https://ipfs.dweb.me/ipfs/Qmbzs7jhkBZuVixhnM3J3QhMrL6bcAoSYiRPZrdoX3DhzB B: https://ipfs.dweb.me/ipfs/QmTds3bVoiM9pzfNJX6vT2ohxnezKPdaGHLd4Ptc4ACMLa Also both work.

If we try and retrieve as bytes via block.get ipfs.block.get(new CID("Qmbzs7jhkBZuVixhnM3J3QhMrL6bcAoSYiRPZrdoX3DhzB") retrieves 102 bytes which I presume is the IPLD which is not what we want. ipfs.block.get(new CID("QmTds3bVoiM9pzfNJX6vT2ohxnezKPdaGHLd4Ptc4ACMLa") retrieves 184324 bytes which is the paper.

Lets move to file.get ipfs.files.cat(new CID("Qmbzs7jhkBZuVixhnM3J3QhMrL6bcAoSYiRPZrdoX3DhzB") retrieves a stream that then generates events for a total of 262438 bytes GOOD ipfs.files.cat(new CID("QmTds3bVoiM9pzfNJX6vT2ohxnezKPdaGHLd4Ptc4ACMLa") retrieves a stream but that stream generates NO events, just sits there - no data, end or error events

The problem seems to be that block.get or files.cat work depending on the hash, but I don't have a way to know which I've got. I think the files.cat behavior is particularly bad as there is no error, just a hung thread.

I've also seen a third behavior where Block.get just sits and hangs, which seems to correspond to cases where ipfs.io also hangs, later attempts on the same URLs seem to work, so I think this is just the case of it taking a really long to fetch, and I don't have a repeatable case yet

mitra42 commented 6 years ago

Any thought on this ... is there really no deterministic way in Javascript to load a hash returned by the HTTP interface when you don't know if its to a DAG/IPLD or to the bytes ?

daviddias commented 6 years ago

@mitra42 The Readable Streams in the browsers sometimes have a weird behavior and don't resume automatically. Have you tried calling .resume()?

We are doing it in the example to make sure it always works https://github.com/ipfs/js-ipfs/blob/master/examples/exchange-files-in-browser/public/js/app.js#L102

We are solving this by providing an alternative API with pull-streams and another one that buffers everything. Follow here: https://github.com/ipfs/interface-ipfs-core/pull/162

mitra42 commented 6 years ago

I can try that, but it doesnt sound like the problem.

Once the streams open they work just fine, the issue is that IF I have a multihash returned by the HTTP api, for a file smaller than some size (I believe 250k) I MUST call the block.get API, and if I have a multihash for a file larger than 250k I MUST call the Files API.

I'm presuming because the larger files are sharded and turned into IPLDs. The problem of course, is that I have no way that I can see (since all I have is the multihash) of knowing which kind of hash I have.

mitra42 commented 6 years ago

@diasdavid - I added the resume. Long file: files.cat("Qmbzs7jhkBZuVixhnM3J3QhMrL6bcAoSYiRPZrdoX3DhzB") it works with or without the resume. Short file: files.cat("QmTds3bVoiM9pzfNJX6vT2ohxnezKPdaGHLd4Ptc4ACMLa"). Without resume(): Sits waiting - no events generated With resume(): Immediate 'end' event, with no 'data' events, so 0 bytes.
That sounds like an bug, ie. it should always be EITHER data, or an 'error' event ?

I guess worst case I could try the files.cat and then if it returns with 0 bytes try the block.get - I've tried that and it works - but sounds like an awful kludge !

mitra42 commented 6 years ago

I think I'm onto something .... looks like there may be a difference between Node and Chrome on this. i built a simpler test case - stripped out my code, and just included IPFS. It works in Node, fails, in Chrome.

I've created three files which it wont let me attach here, so I've put them in my repository : test_ipfs.js - simple test Javascript test_ipfs_bundled.js - bundled version of this (after update npm packages) test_ipfs.html - skeleton to call test_ipfs_bundled.js You can also run the bundled test directly at https://dweb.me/examples/test_ipfs.html

With the attached files, if you run "node test_ipfs.js" both the tests run - the long file version gets 3 chunks: 0, 262144, 294 bytes. ANd the short file gets a single chunk 184324 bytes.

if you open test_ipfs.html in Chrome, and look at the Console, the long file version gets 2 chunks, 262144, and 294 while the short file just ends immediately (no calls of "on")

I don't know the guts of IPFS well enough to fix, but now that I've stripped all my code out of this test and it still shows the difference, I'm pretty sure the problem is internal to IPFS.

mitra42 commented 6 years ago

P.S. If you looked at them earlier, I just pushed a cleaner set of tests - the notes in the previous comment still apply.

jacobheun commented 4 years ago

Is this still a problem in the latest version of js-ipfs? There have been some significant updates in these systems.

mitra42 commented 4 years ago

Sorry, no idea - we obviously built workarounds for this and other bugs when we were still using IPFS.

whizzzkid commented 1 year ago

js-ipfs is being deprecated in favor of Helia. You can https://github.com/ipfs/js-ipfs/issues/4336 and read the migration guide.

Please feel to reopen with any comments before 2023-06-05. We will do a final pass on reopened issues afterward (see https://github.com/ipfs/js-ipfs/issues/4336).