ipld / js-car

Content Addressable aRchive format reader and writer for JavaScript
Other
46 stars 7 forks source link

reading from a file object #84

Closed am2222 closed 2 years ago

am2222 commented 2 years ago

Hi, I have a CAR file object in javascript and want to read it using js-car. But I keep getting unexpected end of the file error. Here is my code I am trying

let arrayBuffer = await files[0].arrayBuffer();
let bytes=new Uint8Array(carFile); 
const reader = await CarReader.fromBytes(bytes) //throws error here
const indexer = await CarIndexer.fromBytes(bytes) //throws error here

I also tired this

let str = await files[0].stream() 
const reader = await CarReader.fromIterable(files[0].stream()) //throws error here

and none of them work. However with the same file this code works

const inStream = fs.createReadStream('test.car')
const reader = await CarReader.fromIterable(inStream)

I checked and I know that CarReader.fromBytes needs a Unit8Arrey and I am sure files[0] is not null. Does anyone knows what I am missing here?

rvagg commented 2 years ago

I think this really comes down to what let arrayBuffer = await files[0].arrayBuffer(); is doing and is it giving you the complete bytes? How is files populated, and what is .arrayBuffer() doing? I'm not sure where this API is from, are you using Node.js? Is this coming out of a core API or something else?

am2222 commented 2 years ago

Hi @rvagg , thanks for the response. The file[0] is coming from web3.storage This is a simple code I am using to read it

import { Web3Storage } from 'web3.storage'

// Construct with token and endpoint
const client = new Web3Storage({ token: API_TOKEN })

// Fetch and verify files from web3.storage
const res = await client.get('bafyreidkwmlavwkguze5urgxytucj2zay2me33ewm3y5hdx6hlxc3sfghu') 
const files = await res.files() // Promise<Web3File[]>
for (const file of files) {
  console.log(`${file.cid} ${file.name} ${file.size}`)
}

I have checked Web3File class and it is using regular JavaScript file interface under the hood. I also tried to use FileReader methods and they result the same thing. I am using the code in the browser. The interesting part is that when I download file and read it from the local drive it reads it with no problems

//This works with no issues.
const inStream = fs.createReadStream('test.car')
const reader = await CarReader.fromIterable(inStream)
rvagg commented 2 years ago

So I believe the web3.storage API is returning the full CAR contents in its singular response but the files() API is a special reader that decodes the CAR for you and returns UnixFS files within that CAR. So what I think your code is trying to do is pass back in a file that's within the CAR you provided as if those bytes were a CAR. So unless you're wrapping CARs in a CAR then this is probably wrong.

If found, the method returns a Web3Response object, which extends the Fetch API response object to add two iterator methods unique to the Web3.Storage client library: files() and unixFsIterator().

If you want to take care of decoding the CAR yourself, then I think you could pass res.body in a stream to CarReader.fromIterable(); which is probably part of what the web3.storage JS library is doing for you, although it'd also be adding on the additional layer of decoding UnixFS files from the blocks within the CAR (i.e. the blocks aren't necessarily your files, the files may be spread across blocks, UnixFS provides a file-like view over those blocks).

Does that make sense?

am2222 commented 2 years ago

Hi @rvagg , Thanks for your help, I appreciate it. I just tried to pass res.body to the CarReader.fromIterable() but it throws this error Uncaught (in promise) TypeError: fromIterable() requires an async iterable which I had to use the solution here (https://jakearchibald.com/2017/async-iterators-and-generators/#making-streams-iterate) to read it and it worked perfect.

although it'd also be adding on the additional layer of decoding UnixFS files from the blocks within the CAR (i.e. the blocks aren't necessarily your files, the files may be spread across blocks, UnixFS provides a file-like view over those blocks).

Actually the CAR I am trying to read is a Cbor object which is an RTree graph. I am trying to read the RTree and use it as an index to be able to search for other object Hashes.

rvagg commented 2 years ago

Cool, so it sounds like you got it sorted. I'm going to pass on some feedback to web3.storage re docs for folks that are working with non-unixfs CAR data.

am2222 commented 2 years ago

@rvagg thanks, I appreciated your help!