inaka / sumo_db

Erlang Persistency Framework
http://inaka.github.io/sumo_db/
Apache License 2.0
174 stars 37 forks source link

Add sumo:count_by function #314

Closed Svampen closed 7 years ago

Svampen commented 7 years ago

In case of sumo:find_by/4,5 limit=0 returning massive amount of Docs, it would be great to know how many Docs to expect. That way you could do successive sumo:find_by/4,5 limit > 0 and offset => 0 which could be very useful for rest endpoints (pagination)

Svampen commented 7 years ago

alternative (which would be better in my case) is if find_by could optionally return back the total of hits

elbrujohalcon commented 7 years ago

@Svampen I would prefer to keep them independent (one function <=> one responsibility).

Svampen commented 7 years ago

I can understand that but running sumo:count_by every time a sumo:find_by to get the total amount of results available could/would inefficient (same query to the backend twice). Anyways I forked sumo_db and made a patch for it which I will do a pull request in case you want it or want to discuss around that further

elbrujohalcon commented 7 years ago

@Svampen I understand how that would be inefficient. If you find yourself in that position and needing a performance improvement, that's when you implement your own store ;)

Svampen commented 7 years ago

I looked at that and I did implement my own store for riak, to support KV instead of Riak Data types, but that wouldn't help me with returning both Docs and Total in the same response as sumo:call only accepts {raw, Raw} and {docs, Docs} as return values.

elbrujohalcon commented 7 years ago

In theory you can use {raw, Raw} in this case. Also, are there DBs that will give you this information in an atomic way? If not, wouldn't you end up hitting the DB twice anyway, even if you use a single sumo call to find_by ?

Svampen commented 7 years ago

Yes you could use {raw, Raw} where you would need to call sumo_internal:wakeup(Doc) for every doc in the Raw variable.

Riak does this in a atomic way so there is no penalty in efficiency. I can't speak for the other DBs but sumo already allows for 'not_supported' functionality like sort for mnesia. The code I wrote up in my patch takes all of this in to consideration and is backwards compatible

cabol commented 7 years ago

@Svampen I think I agree with @elbrujohalcon, I'd try to keep those functions separated, at least for now. Besides, I don't think you need to call sumo:count_by every time a sumo:find_by to get the total amount of results available, actually you only need to call it once, at the beginning, in order to get the total amount of results for that query and with the limit you can calculate the number of pages, then you can just use sumo:find_by and paginate your results using offset and limit.

Now, about the "efficiency", well I think implementing the count of results within sumo:find_by might be worse in some cases, because there might be stores (DB backends) that doesn't allows you to do so in a single query/command, so for those cases you have to execute two commands, one for the count and another for the result list, and in this case you would be executing the count every time (which is not needed). So, IMHO I'd rather to keep things simple, use count_by and find_by on demand – at least for now.

Svampen commented 7 years ago

@cabol Only calling sumo:count_by once would require the code to know in which "context" sumo:find_by was called. In my case it's a REST endpoint and the code won't have any idea if the client wants it paginated for future REST calls. I could probably design it so it would know if the client wants the result paginated (return back the total only if client requests it) and maybe that is a better solution. (though in this instance using Riak as DB it is inefficient) I will think about this. Thank you all for the discussion :)

cabol commented 7 years ago

@Svampen in case of Riak, I agree, it would be inefficient, actually if you check the Riak's documentation they don't recommend to do this kind of operation at all –– fetch all results or count all results in database. And about pagination, I think there is a lot of ways to handle pagination, it is a whole topic, but you have to choose the better way to handle it according to your DB technology, for some cases it is irrelevant to know the number of pages in advance, so you don't need the the total number of results (it could be handled from client/UI side), even more if we are talking about thousands or millions of records. Thanks and let us know if we can help you more with this, we stay tuned!