Closed bbotella closed 8 years ago
Hm, perhaps it would be nice to do some designing on the user interface first. :) I mean, how are we expecting people to use this?
It would be nice to arrive at something more or less usable (and preferably something that allowed people to copy'n paste filters).
So, here is one idea, using lists to represent the operations:
"filter": {
"name": "exporters.filters.MultipleFilter",
"options": {
"filters": ["or",
{"name": "exporters.filters.PythonexpFilter", "options": {...}},
{"name": "exporters.filters.KeyValueFilter", "options": {...}},
["and",
{"name": "exporters.filters.PythonexpFilter", "options": {...}},
{"name": "exporters.filters.KeyValueFilter", "options": {...}},
]
]
}
}
Here is another, using dicts:
"filter": {
"name": "exporters.filters.MultipleFilter",
"options": {
"filters": {
"or": [
{"name": "exporters.filters.PythonexpFilter", "options": {...}},
{"name": "exporters.filters.KeyValueFilter", "options": {...}},
{"and": [
{"name": "exporters.filters.PythonexpFilter", "options": {...}},
{"name": "exporters.filters.KeyValueFilter", "options": {...}},
]},
{"name": "exporters.filters.KeyValueRegexFilter", "options": {...}},
]}
}
}
I like more the syntax of this last one, but it will require more validation than the latter (because it can't allow a dict representing a filter to have both "and" and "or" in the same object).
The validation for lists is a bit more simple (if it's a list, the first element has to be a string). So, it's a bit more like "worse is better" (simpler implementation, syntax not much friendly).
I see I forgot to ask... what do you think? :)
My idea was to support something like:
{
'filters': {
'filter1': {
'name': 'exporters.filters.KeyValueFilter',
'options': {
'keys': [{'name': 'country_code', 'value': 'es', 'operator': 'contains'}]
}
},
'filter2': {
'name': 'exporters.filters.KeyValueFilter',
'options': {
'keys': [{'name': 'name', 'value': 'item1', 'operator': 'contains'}]
}
},
'filter3': {
'name': 'exporters.filters.KeyValueFilter',
'options': {
'keys': [{'name': 'name', 'value': 'item3', 'operator': 'contains'}]
}
}
},
'composition': '(filter1 and filter2) or filter3'
}
This would allow us to make clear and user friendly compositions, only by replacing filter names in composition string by their filter value, and running that resulting composition string on the python interpreter (that would be something like '(True and False) or True')
Hmm, as an user, I would not like all that indirection (having to come up with names for filters and having to type them twice) and it doesn't look like this would make implementation simpler.
I also don't think that having to do string replacement for every record passing over the filter and evaluating it is an advantage for that approach.
With any of the two options I mentioned, the filter would be "compiled" into a function in the constructor (Python code for the "and" and "or") and then calling the filter for each record would be simply that function call (no need for eval).
Btw, a good source of inspiration is MongoDB query filters. :)
@bbotella @eliasdorneles we have this PR https://github.com/scrapinghub/exporters/pull/325 that is almost done, do you think it makes sense to keep this one or close it?
We can close it, yes, this was another PR done as an experiment and discussion starter. =) Thanks @bbotella ! <3
We still lack composition. With this approach, we define some filters, and we can add a composition string to the options that would go like:
(filter_name1 AND filter_name2) OR filter_name3
Maybe we may need a custom interpreter to parse this composition string. Thoughts @eliasdorneles?