timshannon / badgerhold

BadgerHold is an embeddable NoSQL store for querying Go types built on Badger
MIT License
514 stars 52 forks source link

Query: Is there any way to get all keys (not the entire record) from the DB #103

Open PD-Pramila opened 10 months ago

PD-Pramila commented 10 months ago

I Checked the code and saw that store.Find() can return all the records (with all fields). But if there are billion of records, then it will consume lot of memory to hold all records with all fields.

There is store.Foreach(), it will call func for every record but same thins, if there are billion of records, it might be slow to process every records and then get list of all keys.

Is there any other way to get all keys only?

I opened an issue for stream read, is there any plan to add that feature in badhgerhold?

timshannon commented 10 months ago

You're going to run into the same issue as https://github.com/timshannon/badgerhold/issues/94

If you want to query any of the fields that aren't the Key, then they need to be retrieved from the DB. If everything you need to loop through is in the key, then you can iterate on the key yourself directly against the Badger DB.

PD-Pramila commented 10 months ago

@timshannon Thanks for the reply. When we insert record (func (s *Store) Insert(key, data interface{})), we specify key and data. Badgerhold encode it using gob. I want to retrieve that key only (which is Badgerhold.Key) and which is not part of data from DB.

Just to get the DB, why i need to get entire data part for that? Also, the badger DB iterator, how it will decode the gob value, which is badger hold DB specific?

timshannon commented 10 months ago

Just to get the DB, why i need to get entire data part for that?

You don't have to. Just use Badger directly.

how it will decode the gob value, which is badger hold DB specific?

With the core Gob libraries: https://pkg.go.dev/encoding/gob@go1.21.1

The Key value is just the Gob encoded value of whatever key object you pass in.

Based on all of the other issues you submitted, I'm not sure BadgerHold is a good fit for your project at all. You're requesting re-implementing features that are already built into Badger. I'm guessing you should just use Badger.

I'd recommend taking a look at the Badger documentation: https://dgraph.io/docs/badger/get-started/#iterating-over-keys

PD-Pramila commented 10 months ago

I understood your point. The one of the reasons we are using badgerhold and not badger DB, is aggregate queries with groupby, which are not there in badger and that's our main use case.

Also, it uses indexing and gob(which we can use on our own though).

I got your point to use badger DB iterator in start itself, but i dont see that it's anyway different than For each of badgerhold. They don't iterate on just keys. they get the data along with these.

What I was looking for is that DB should not fetch the the value/data part, and just return the keys, if that is stored in mem. That's why asked, is there any way to get only keys from DB, which can be faster than getting key+value and then extract the key from results.

Thanks for your replies. I will see what can be the better option.

timshannon commented 10 months ago

Once last recommendation. If you're honestly talking about billions of records, you absolutely should start looking at a real database, if even just something like sqlite. You're going to run into issues constantly otherwise.