vmx / couchdb

Mirror of Apache CouchDB
81 stars 4 forks source link

feature request: integrate with the _changes feed filter mechanism #8

Closed jmarca closed 13 years ago

jmarca commented 14 years ago

I'm sure this is a long way off, but it would be nice to have access to bounding box queries in _changes filters.

For example, if I have a database of traffic conditions, a changes continuous feed with a bbox parameter would just return the conditions that are changing within the boundary box.

berb commented 14 years ago

I think you can implement that straight-forward on your own just by defining a filter function. You can even use parameters in your notification filter that you set when calling your continuous feed. This allows you to pass the bbox.

jmarca commented 14 years ago

I don't think it is straight-forward to access the geospatial index from within the filter function. Of course I can hand-check whether each element in the changes feed is in or out of the bounding box, but this is horrendously inefficient and the whole reason for this awesome GeoCouch work. I might be missing something though, so I will look at the code and see if there is a way to hook into the spatial index.

berb commented 14 years ago

A filter function is applied on each new/changed item which thus is a candidate for the feed. So the filter function only has to run some number comparisons on a per-item basis. This is different than querying a high number of items using an index, where a R-Tree backed index improves efficiency thoroughly.

jmarca commented 14 years ago

Suppose I have 100 users all asking for the changes since yesterday, each with a different bounding box. Suppose I have 2 million points scattered over a huge geographic region, and they all change every 30 seconds.

If the filter function runs once per change, there is only a small benefit to using spatial indexing. If it runs once per request for "changes" (which is my understanding of how it works, or else how would it be able to read the bbox attribute of the request), then there is a huge efficiency gain to be had. Searching for a bounding box overlap with a spatial index is fast; comparing x and then y to a bounding box for each record is not fast.

vmx commented 14 years ago

jmarca: the use case you explained last makes sense. Though there are several problems. One is, that indexes are normally updated on request, so for a valuable _changes feed they would need to be updated immediately after an insert. To cut the answer short: it's a lot of work, don't expect it to be done :)

jmarca commented 14 years ago

vmx: yes, it is a long way off, as I said in the beginning. I think the first step is to integrate regular CouchDB indices with _changes, then this becomes a special case of that.

Maybe I'll use a spatial index to store references to CouchDBs, one per detector, so the work done by each _changes feed stays manageable and I only ask for what I need.

As to my original request, unfortunately, the most I can contribute is to recognize the problem. Perhaps this issue should be closed for now?

vmx commented 13 years ago

I close this one with "won't fix".