ipni / index-provider

📢 Index Provider
Other
36 stars 17 forks source link

Import ux slow / needs improvment #118

Closed willscott closed 2 years ago

willscott commented 2 years ago
$ ./index-provider import car -l http://localhost:3102 -i bafybeihh.carv2
Post "http://localhost:3102/admin/import/car": EOF
masih commented 2 years ago

Do you know:

willscott commented 2 years ago

It was made using car index, with the default codec

do we have any tools that will cause the fully indexed characteristic to be set? should we?

willscott commented 2 years ago

the 'isfullyindexed' characteristic makes no sense to me. it seems to not be a property of a car, but the context in which the car is being used, right? the same car is both fully index and not fully indexed depending not in it's contents, but on whether someone thought identity CIDs might be there?

masih commented 2 years ago

Fully indexed characteristic simply means does the index represent a catalog of all CIDs in the CAR or not, so I'd say it does depend on the content of the CAR. For example, a CAR can be indexed with an option to ignore IDENTITY CIDs. The resulting index would not then be a catalog of all CIDs in the CAR and will not have "fully indexed" characteristic.

For the purposes of index-provider what we do want is a catalog of all CIDs in a CAR, i.e. need to also advertise IDENTITY CIDs. If a given CAR index is not a full catalog then we'd need to re-generate the index.

A small note on where all of this comes from: maintaining backward compatibility while supporting CAR indices that do include IDENTITY CIDs. For context see https://github.com/ipld/go-car/pull/239 and https://github.com/ipld/ipld/pull/138

willscott commented 2 years ago

if i have a car i'm generating an index for, i don't know if it was meant to have identity cids, but didn't or if it just didn't have identity cids. in what contexts are we comfortable setting the bit?

masih commented 2 years ago

I see what you are saying. It all depends on the use case. In FileCoin use case IDENTITY CIDs should be included because I think otherwise the commp calculation will not match among other reasons (e.g. IDENTITY CIDs with link in data block). Remember, these options were added after the effect to avoid breaking changes where existing usage didn't match the spec at the time.

What go-car is trying to do is to facilitate usage, may it be including IDENTIY or not including it.

So to answer your question: the user generating an indexed CAR file should know in what context that CAR file is going to be used. And if the user doesn't know, that's OK. Because, characteristics bits offer a way for readers of that CAR file to do the right thing depending on the context they are using the indexed CAR file in. For example, because index-provider needs to include IDENTITY CIDs in advertisements, given an indexed CAR it checks if the index includes IDENTITY, and if so uses the index as is. Otherwise, it regenerates the index it needs.

Does that help clarify and answer your question?

willscott commented 2 years ago

I'm still confused by this at an abstract level:

masih commented 2 years ago

this index provider is not going to index or deal with identity cids

Discussed in 1:1 chat. Outcome, IDENTITY CIDs are needed by graphsync server that serves retrieval requests but need not to be included in advertisements.

Way forward: wrap datastore with IdentityStore to allow retrieval graphsync server to function and relax the need for FullyIndexded characteristic in CarSupplier.

Many thanks for the discussion @willscott