polyfractal / sherlock

MIT License
119 stars 28 forks source link

AndFilter doesn't work #71

Closed svscorp closed 11 years ago

svscorp commented 11 years ago

Today I start working on some task which involves filtering.

I need to do: 1 term filter AND 1 range filter

If i do:

$filter = Sherlock::filterBuilder();
$filterWrapper = $filter->OrFilter()->queries(
                        $filter->Term()->field("f1")->term("val"),
                        $filter->Range()->field("date")->from(strtotime('2013-07-09 00:00:00'))->to(strtotime('2013-07-11 00:00:00'))
                    );

it works.

If I do 'AndFilter'

$filter = Sherlock::filterBuilder();
$filterWrapper = $filter->AndFilter()->queries(
                        $filter->Term()->field("f1")->term("val"),
                        $filter->Range()->field("date")->from(strtotime('2013-07-09 00:00:00'))->to(strtotime('2013-07-11 00:00:00'))
                    );

it doesn't works.

Errors:

Notice: Undefined index: and in path/Sherlock/components/filters/AndFilter.php
SearchPhaseExecutionException[Failed to execute phase [query] ...

I saw And and Or filter classes are different. Should it be like that? Or it should be the same as 'or'?

svscorp commented 11 years ago

If And should works as Or, just let me know, I will do the pull request. And I hope it is possible to accept/roll it out to the new tag ASAP :)

polyfractal commented 11 years ago

Went ahead and fixed this myself since I was dealing with the OrFilter problem at the same time. Let me know if you encounter any problems.

Tangentially related, you'll get better performance out of Elasticsearch if you use the Bool filter instead of And/Or/Not. More details can be found in this article: http://euphonious-intuition.com/2013/05/all-about-elasticsearch-filter-bitsets/

svscorp commented 11 years ago

@polyfractal Btw, I encountered one more problem with breaking ordering when using filter. I describe it here: http://stackoverflow.com/questions/17574835/elasticsearch-filter-impact-on-the-sorting

And one guys gave me an answer that CACHE option is actually placed in the wrong place. It should be one level deeper. Could you fix/look this as well?

svscorp commented 11 years ago

Update: I see "_cache" is in right position. Great work!

Will have a look at Bool filter.

But I still have a question. Why the result isn't in 'filtered' query, when I use the filter?

svscorp commented 11 years ago

And I still can see "_cache" in toJSON() output for each filter (set as TRUE) and for whole 'filters' (set as FALSE).

Is the '_cache' option correct to be in each filter instead of just one in 'filters' block?

polyfractal commented 11 years ago

Could you paste your entire JSON output? Not sure I understand your question about "filtered". How are you adding the filters and queries to the total query request?

To use a filtered query, you will need to place $filterWrapper inside the filter() of a FilteredQuery object (accessed through the query builder):

$request = $sherlock->search();
$filter = Sherlock::filterBuilder();
$filterWrapper = $filter->AndFilter()->queries(
                        $filter->Term()->field("f1")->term("val"),
                        $filter->Range()->field("date")->from(strtotime('2013-07-09 00:00:00'))->to(strtotime('2013-07-11 00:00:00'))
                    );

$query = Sherlock::queryBuilder()
                ->query($someQuery)
                ->filter($filterWrapper);

$req->index("test")
   ->type("test")
   ->query($query)
   ->execute();

The filter() that is accessible on the request object is a top-level filter in Elasticsearch and has a different scope (usually only used with facets).

If you are referring to _cache for each Term filter, that's a default for most filters. Check out the Term Filter page, you can see that _cache can be disabled, but is normally set to true (just transparently)

When I first started working on Sherlock, I made the mistake of manually setting defaults in the query output to simplify query construction. It's going away in 0.2.