ipfs / go-datastore

key-value datastore interfaces
MIT License
230 stars 65 forks source link

don't append "/" to prefix in queries #196

Closed noot closed 1 year ago

noot commented 2 years ago

Currently, when doing a datastore query using Prefix, a / is automatically appended to the end of the prefix (see https://github.com/ipfs/go-datastore/blob/master/query/query_impl.go#L134)

This caused issues for me when implementing prefix lookups in the Kad-DHT. The Kad-DHT ProviderStore uses go-datastore, but since a slash was automatically added to prefixes, it caused nothing to be returned when looking up a provider using prefix lookups.

This behaviour seems somewhat strange to me, I wouldn't expect the user-specified prefix to be modified in any way. If the prefix should have a slash on the end, the calling program should add it. Is there a reason to add the slash, and can it be removed? Alternatively, is there a workaround for this? Thanks!

welcome[bot] commented 2 years ago

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

Finally, remember to use https://discuss.ipfs.io if you just need general support.

guseggert commented 2 years ago

The comment above that line explains the rationale for this behavior:

https://github.com/ipfs/go-datastore/blob/04c7b6fca92f8524ab8e2d9ea7ec6d3853a9c075/query/query_impl.go#L121-L122

guseggert commented 2 years ago

There's no workaround for this. Given the constraint laid out in that comment, is there any change we could make here? Otherwise we can close this.

github-actions[bot] commented 2 years ago

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.

noot commented 1 year ago

@guseggert Thanks for the response. Another solution I could see is to have a separate query for prefixes that doesn't append the /, or there could be an option to disable appending it. That way it wouldn't break any existing dependencies.

github-actions[bot] commented 1 year ago

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.

Jorropo commented 1 year ago

@noot I don't see what is the usecase for this ? Do you want to query multiple prefixes that start the same ?

noot commented 1 year ago

@Jorropo yes, exactly. For the DHT prefix lookup implementation I'm working on, only a prefix of the CID is queried for added privacy. Appending the slash breaks the implementation since the database key is (essentially) the CID. So if my CID is bafkreidlasregfrefuxtigrw74w56ybfsuyxnrsp3hafnxiicg2mjntcuu and I try to find keys starting with bafkreidlasregfrefuxtigrw74w56y in the database, I won't be able to find anything.

Jorropo commented 1 year ago

@noot I don't think this would be an effective way to make the DHT more privacy focused, it would be quite hard to estimate the size of the network and how big your prefix should be. bafkreidlasregfrefuxtigrw74w56y is so big it would certainly only match one record and don't provide any privacy. You would need to have prefixes so small you match a huge number of records else it would be easy to do correlation attacks with the block links.

I think you should join the #ipfs-content-routing-wg channel on https://filecoin.io/slack, for DHT privacy we are more intrested in double hashing. nvm I see you know about it already


Anyway about this change:

About doing this actually, the Query package is generally not fast with O(N) behaviours in many places, but it could be made faster. Ensuring query works by namespaces allows us to write bucketing code (where keys can be bucketed by namespace).

I think it can work right now if you use a query that use a KeyPrefix filter. AFAIT this would looks like this:

q := query.Query{
  Prefix: "/florb/keys/thing/",
  Filters: []query.Filter{ query.FilterKeyPrefix{"bafkreidlasregfrefuxtigrw74w56y"} },
}
noot commented 1 year ago

@Jorropo using query.Filter with FilterKeyPrefix worked for me, thank you!