redgeoff / spiegel

Scalable replication and change listening for CouchDB
MIT License
144 stars 19 forks source link

Ability to specify view in on_change #47

Open redgeoff opened 6 years ago

lostnet commented 5 years ago

I've experimented with forcing a view on dbs that match a specific regex and it seems to work ok, but it has the same conceptual problem as mango queries in https://github.com/redgeoff/spiegel/issues/122#issuecomment-391596516. It would be necessary to configure this per db and then live with one shared view to avoid separate _change feeds (one per view) where multiple may be observing one db.

Since one could always run with multiple spiegeldbs to handle the problem of multiple filter change feed criteria on the same db, I think it would make the most sense to declare db filter criteria based on db name matching in a separate doc so a change listeners inherits one set of criteria independent of the on_changes documents using it. That criteria could then use views, filters, mango filters etc for the one change feed. Thoughts?

redgeoff commented 5 years ago

I need to think about this a little more, but so I understand this a bit better, would adding support for sift give you what you need? Structurally speaking, this would be a much easier change

lostnet commented 5 years ago

@redgeoff I don't think sift would help since however complex the queries can be, I think it is still a static config per on_change doc? I dynamically add a design doc to filter based on the user's roles into their per-user-db and then I use this view in the _changes feed. Consequently, I only need one on_changes doc with my wildcard that matches all userdbs, but then it needs to be to the filtered view (otherwise it would need to store a copy of the current filtering for each userdb in the spiegeldb.)

In thinking about the spiegel side a bit more, if supporting exactly one changes feed per db, I think it would be one global change options configuration document with an ordered list, i.e.:

   id: 'perdb_change_options'
   rules:
   [
     {db_name:'^user_', options:{filter:'_view',view:'userdbview'}}
     {db_name:regex2, options:{selector:...}} // I think mango queries could be passed to couch too
   ]

I used listener.db_name in _changesForListener to check if my additional options filter and view options apply. For backwards compatibility, a table here that is always null for setups without the new config doc should have little performance and no behavior impact so I don't think it is necessary to modify cl docs or anything to have basic couch side filtering configuration.

redgeoff commented 5 years ago

Your idea of having a config per DB sounds cool, but I think it could lead to problems where an on_change doesn't fire because the user doesn't expect to have to have a global view config.

The most robust solution is to have a ChangeListener per db-view pair and this will make things A LOT more complex.

Another idea I have is to read in the views dynamically, which are just JS functions, and then run the function against the change. This would be less efficient than using a view, but it would allow us to avoid having to make significant structural changes to Spiegel.

lostnet commented 5 years ago

Your idea of having a config per DB sounds cool, but I think it could lead to problems where an on_change doesn't fire because the user doesn't expect to have to have a global view config.

This could only really happen if 2 users are configuring the same spiegeldb, i.e. if the config doc doesn't exist or doesn't have a matching regex there are no additional options, so only spiegel's current _changes options apply keeping all existing usage of spiegel from changing behavior. It would be nice to have a cli option to insist the view config file must exist to avoid the problem of trying to configure it but not knowing if you are still just getting all docs.

The most robust solution is to have a ChangeListener per db-view pair and this will make things A LOT more complex.

I agree on both counts. But if this were done, I would still want separate config (file or files) with named view configurations so repeating the same view/filter in 2 on_changes matching the same db is deduped by explicit config to one feed (and 2 connected on_changes meant to process the same thing couldn't get out of sync). I think the a single view config file rules could just start matching on (db_name?+)name->options instead of db_name->options. The rules could already be {db_name, name, options} instead of {db_name, options} so the name could be used for logging (and assertions?) now and in matching if multiple views per db are supported later.

Another idea I have is to read in the views dynamically, which are just JS functions, and then run the function against the change. This would be less efficient than using a view, but it would allow us to avoid having to make significant structural changes to Spiegel.

I don't really like the idea of running the design js in a different sandbox after doing independent but hopefully identical security checks.. I.e. a regular user tricks us into running a doc that is technically not a design document by couchdb's standards or an attack that would have failed in the couchdb sandbox succeeds in spiegel's. I suppose I only need a separate design doc in the db being observed with spiegel specific normal doc content to be pulled in by the spiegel and available as a variable. Though then I would be accepting all the content from spiegel to filter in the API on a recreation of the code in these design docs.

redgeoff commented 5 years ago

I guess I need to be a little clearer about my concern with #1. The issue I see is not from an implementation standpoint as I think it will work fine. My problem is that once you specify a ChangListener to use a view then it is locked to that view and that means that all other on_changes for this DB must we with respect to this view. I don’t think this is very intuitive and I think there will be cases where we would want one on_changes to be with respect to a view and another on_change to be respect to another view (or no view). I’m not sold that this is the best solution as it feels like a hack to Spiegel and only supports a limited number of use cases.

With #2, I think the solution is more complex than just having a config file. We would need a ChangeListener per DB-view pair as you could any combination of views per DB. This is doable, but seems like a lot of work and I personally do not have the time to implement this. This solution is the one I would recommend though as I think it is the proper solution, which will support the most use cases.

As for #3: I agree with the security concerns and don’t think it is a great idea. I do think it would be easier to implement than #2 and would provide roughly the same features, but with reduced performance and security.

At this point, I’m not really sure on the best path forward as no one else has requested support for views. Maybe this idea should be tabled for now.

redgeoff commented 5 years ago

Although inefficient, you can currently use a custom API endpoint to implement the special view logic and just specify this endpoint in an on_change doc.