silverstripe / silverstripe-fulltextsearch

Adds external full text search engine support to Silverstripe
BSD 3-Clause "New" or "Revised" License
44 stars 83 forks source link

Solr searching fails when not searching Versioned content #33

Closed madmatt closed 6 years ago

madmatt commented 11 years ago

If you create a sub-class of SolrIndex to define indexes on DataObjects that don't use the Versioned extension, the index will be built and loaded into Solr, but no searches performed against that index will work.

For example:

// Story.php
class Story extends DataObject {
    private static $db = array(
        'Title' => 'Varchar(255)'
    );
}

// StorySolrIndex.php
class StorySolrIndex extends SolrIndex {
    public function init() {
        $this->addClass('Story');
        $this->addFulltextField('Title');
    }
}

Running Solr_Configure and Solr_Reindex will work fine (Solr_Reindex will say class Story, total: 5 in state [])

However, creating an instance of that SolrIndex and running a query against it will fail:

$index = new StorySolrIndex();
$query = new SearchQuery();
$query->search('Test', null, array(
    'Story_Title' => 2
));

$results = $index->search($query);

Nothing will be returned, but Solr will log an Exception with the message: SEVERE: org.apache.solr.common.SolrException: undefined field _versionedstage

If you add something to the StorySolrIndex that is versioned (e.g. SiteTree), then the search will work as usual, but with extra stuff in the index.

// StorySolrIndex.php
class StorySolrIndex extends SolrIndex {
    public function init() {
        $this->addClass('Story');
        $this->addClass('SiteTree'); // This works around the bug
        $this->addFulltextField('Title');
    }
}
phptek commented 11 years ago

I can confirm a simialr error as @madmatt.

Using the HEAD of master (15daed3e) with a slightly different error while using the subsites module. If force a GET method, dump the complete query string and paste it straight into the browser thus:

http://localhost:8983/solr/MyCustomSearchIndex/select?fq=%2B%28_subsite%3A%220%22+%28*%3A*+-_subsite%3A[*+TO+*]%29%29+%2B%28_versionedstage%3A%22Live%22+%28*%3A*+-_versionedstage%3A[*+TO+*]%29%29&wt=json&json.nl=map&q=%2BWayne&start=0&rows=10

...then I get a 400 error from Solr and the following error:

undefined field _subsite

Here are my class-definitions:

class Profile extends DataObject{
private static $db = array(
    'Name'=>'Varchar(255)',
    'Designation'=>'Varchar(255)',
    'Bio'=>'HTMLText'
);
}

class MyCustomSearchIndex extends SolrIndex {
/*
 * @return void
 */
public function init() {
    $this->addClass('Profile');
    $this->addFulltextField('Name', 'Text');
    $this->addFulltextField('Designation', 'Text');
    $this->addFulltextField('Bio', 'HTMLText');
    }
}
unclecheese commented 10 years ago

I'm having the same issue. Did either of you figure out a workaround?

madmatt commented 10 years ago

I didn't have a chance to fix it properly. My workaround was to index something on SiteTree as well as my un-versioned DataObject (see the last code block on the original issue comment).

In my case, I didn't need to do anything further as I didn't have any SiteTree objects that could have been accidentally returned in the index anyway, but I presume there's a way to limit the results you get to just your DataObject, instead of returning a mix of DataObject and SiteTree objects.

Hopefully that's useful? :)

unclecheese commented 10 years ago

Yeah, I was able to discover that through the IRC logs. Not an ideal solution, and I didn't understand the code well enough to implement a patch, so I bailed on Solr and went with Sphinx. Too bad. Hopefully something comes out soon.

FWIW, another fix would be to overload the search() method in your SolrIndex subclass and remove the call to 'alterQuery', but that method is about 100 lines long, and that's a lot of DRY violations for a single line of change :)

hafriedlander commented 10 years ago

All the variant subclasses are supposed to know enough to not modify indexes or queries when those indexes don't contain classes they apply to.

Unfortunately looks like there's a bug in SolrIndex.php at line 318 where it only filters the variants when there's exactly one class in an index, otherwise it just uses every variant regardless.

Haven't found time to fix it yet though.

phptek commented 10 years ago

I can verify @madmatt's solution.

In this case all I did was create my custom SolrIndex subclass, and added an addClass call for SiteTree. A little clunky for sure:

class MySearchIndex extends SolrIndex {

/*
 * @return void
 */
public function init() {
    $this->addClass('MyDataObject');
    $this->addClass('SiteTree');
    $this->addAllFulltextFields();
    $this->addFilterField('ShowInSearch');
}

}

kinglozzer commented 10 years ago

Still an issue :(

madmatt commented 9 years ago

@phptek, @kinglozzer, @unclecheese: A better workaround is to call ->inClass() on the SearchQuery object you pass through. In my original example, that would be:

$query = new SearchQuery();
$query->inClass('Story');

So long as you only pass one class in, and that class doesn't have the Versioned extension on it, then it works fine.

The offending line is here:

SearchVariant::with(count($query->classes) == 1 ? $query->classes[0]['class'] : null)->call('alterQuery', $query, $this);

This will only pass a class name through if $query->classes contains exactly one array element. Otherwise, it passes null through, and the upshot of that is that instead of SearchVariant->appliesTo($class) being called, SearchVariant->appliesToEnvironment() is called instead.

I'm trying to figure out a decent way to patch this without breaking BC now.

michalkleiner commented 8 years ago

Any luck so far @madmatt?

intotheweb101 commented 7 years ago

Anyone have any luck with this?

madmatt commented 7 years ago

@michalkleiner, @intotheweb101: I haven't had to look into this for a while now, as the majority of content I index is versioned (even if we only index the 'Live' version, it still has a version applied).

However, as far as I know the workaround I posted above still works. You just have to specify when you are performing a search query that you only want to search one class - the dataobject that isn't versioned in your index. Let me know if that doesn't work!

robbieaverill commented 6 years ago

I've done some quick investigation into this, and this also applies to the subsites variant that ships by default in this module, as well as the FluentSearchVariant that ships with Fluent in CWP 2.0.

Aside from the issues mentioned already in this thread around only looking for one search class, if you don't provide a search class at all then SearchVariant::with will end up loading all search variants that apply to the environment without checking against the classes in the index.

At this point the SearchQuery doesn't contain any class references, but the SolrIndex you're running knows which classes it indexes. We could perhaps add some logic in this situation to loop the classes in the index and check whether each variant is applicable to each class, only then applying them if so (and they apply to the environment as well).

Summary:

I'm running on a heavy assumption that as long as a variant applies to at least one record in the index then it won't break results any others in the index that it doesn't apply to.

How does this sound @madmatt?

madmatt commented 6 years ago

Thanks for your review @robbieaverill!

I'm running on a heavy assumption that as long as a variant applies to at least one record in the index then it won't break results any others in the index that it doesn't apply to.

This is a bit of an issue I think. For example, the Versioned variant adds a filter that only includes content where the version is Draft/Live - this means that any content that isn't versioned in the index isn't included in results. I imagine this would be similar for any other variants as well (e.g. if only some content in the index has a subsite ID, the subsite variant will never include any of it.

I don't think there's really a good way to solve this though - in these cases I think you have three options (in roughly my order of preference):

robbieaverill commented 6 years ago

Keep it as is, but adjust the search filters so that they allow results where the variant isn't applied (e.g. the versioned filter should be _version = 'Draft' OR _version = NULL) - I'm not sure if this happens at the moment or not?

Yeah, this makes sense in theory, but what I'm seeing (on SS4) is that the variants that aren't relevant to the class you add to the index aren't included when running Solr_Reindex, but are being used to alter the query when you're querying that index - this is because it's not looking at the classes in the index, but only the classes you may or may not specify on the SearchQuery when you run it.

Also when the variant adds the _version filter column to indexes that don't have it then Solr throws an exception, rather than ignoring it.

At least there's a workaround for now!

robbieaverill commented 6 years ago

@madmatt I've made a pull request at https://github.com/silverstripe/silverstripe-fulltextsearch/pull/202 as a WIP to sort of show what I was thinking with this - let me know what you think

robbieaverill commented 6 years ago

Fixed in #202 :-)