regro / regolith

Research Group Content Managment System
http://regro.github.io/regolith-docs/
Other
14 stars 68 forks source link

prefilters #151

Open sbillinge opened 6 years ago

sbillinge commented 6 years ago

Now I would like to run some prefilters before running builders. Here is a concrete example.

UC1

  1. Chris wants to make his presentation list
  2. Chris prefilters to return a collection that looks like the chainDB('presentations') collection that will be passed to the filter, but it only contains presentations that contain Chris in the author field.

UC2

  1. Simon needs a list of presentations that acknowledged a particular grant so he prefilters with grant = 'fwp17' and then runs the builder.

UC3

  1. As UC2 but Simon wants to filter on grant and date-range so he sets up all the prefilters and runs them before running builder.
CJ-Wright commented 6 years ago

@scopatz thoughts?

sbillinge commented 6 years ago

for me this is my main goal from the system. I could of course write bespoke builders for every report I ever want to generate, but the reports come fast and furious, and so the ability to do filtering more or less on the fly would be great. If I could also save certain filters so I can recreate that builder in the future it is also awesome.

I would imagine that if I was an advanced user and I knew which builder I wanted and I knew the schema keys I wanted to filter on and the values I wanted them to have the infrastructure could be rather general. I could do, for example, regolith build preslist --filters "author in [simon, chris]", "begin_year after 2017", "begin_month after Oct" to generate the presentions that either Chris or I made in the latest federal fiscal year.

Of course the syntax wouldn't look like that, but that is the idea. It would be actually easy to build a gui on top of this too for less advance users

On Sun, Jun 3, 2018 at 5:12 PM Christopher J. Wright < notifications@github.com> wrote:

@scopatz https://github.com/scopatz thoughts?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/regro/regolith/issues/151#issuecomment-394191773, or mute the thread https://github.com/notifications/unsubscribe-auth/AEDrUUzZcWJ0wT89kUUdmdr-vFKqGgJvks5t5FE8gaJpZM4UYOqS .

CJ-Wright commented 6 years ago

I think this is a reasonable ask. My main concern right now is the syntax and implementation. Merging a cli and arbitrary filtering coffee could be rather messy, although maybe doable with xonsh?

On the implementation side I think it would be reasonable to have a dict where the keys are the collections to be filtered and the values are callables which return true or false for those documents in the collection.

scopatz commented 6 years ago

I think this is a good idea. This could probably be done with command line arguments that are added to build rather than trying to do anything super fancy with xonsh.

To your dict idea, there are standard, JSON/dict syntaxes out there for querying and filtering. MongoDB has one, but there are probably others. I recomend that we adopt something relatively standard, or even better, use an existing implementation

CJ-Wright commented 6 years ago

I think there is a package called mongoquery which uses mongo syntax for other back ends.