Open robbieaverill opened 6 years ago
Discussed with @tractorcow - suggested approach:
$this->owner->FlattenedBlocks()->setByIDs($collatedListOfIds);
Then when searching, we collate a list of all searchable fields from all content blocks, add them all to a BaseElement query like this:
$conditions = [];
foreach ($classes as $class) {
$classFields = $class::create()->searchableFields();
$tableName = DataObject::getSchema()->tableName($class);
foreach ($classFields as $classField => $specs) {
$conditions['"' . $tableName . '"."' . $classField . '" LIKE ?'] = '%' . $searchTerm . '%';
}
}
$matchingElementIds = BaseElement::get()
->whereAny($conditions)
->column('ID');
Note that multiple conditions will exist where search fields are used across multiple element types (notable examples are ID, Title and LastEdited which are defined on BaseElement
so will be used for all elements that extend it).
From here we can use the many_many
through denormalised relationship mentioned earlier to filter by these IDs *:
return SiteTree::get()->filter(['FlattenedBlocks.ID' => $matchingElementIds]);
If we can apply these filters on top of the default search filters from SiteTreeFilter_Search
then regular CMS search should still work as well (note: it works in the current implementation too).
*: in theory
The solution is the shortest path to a baseline implementation, but I think it's a missed opportunity. We should at least evaluate all possible options.
The CMS search was always quite clunky - here's one example from silverstripe.org where a search for "security" brings up the "security releases" page on tenth place
LIKE
) search with faceted search (e.g. date ranges on particular columns)PartialMatchFilter
which produces LIKE
queries)Note: Either option could switch to using FULLTEXT
search for fuzzy matches (e.g. "car port" vs. "carport") and intelligence around multi-term searches
Store all content in an object hierarchy on the "root" object (e.g. blocks, nested blocks, other related records, all would be stored on page). Create a separate SearchIndex
table. Note that MySQL doesn't support FULLTEXT indexes across tables, so this would need to happen on an app level.
There's two ways of populating this index table:
Add one row per object in object graph, and search across records based on flattened relation data
Similar to "Option 2e: Denormalise through database index per table", but storing index as column data (not database metadata).
CMSSiteTreeFilter
PHP API, wouldn't cleanly apply to a Solr queryCreate a combined index across columns in one table. MySQL supports this for FULLTEXT indexes (e.g. MATCH(col1, col2, col3) AGAINST "my term"
). Similar to "Option 2b: Denormalise by "all content" column within a single table".
Note: I think we've stopped using MySQL FULLTEXT
as part of migrating from MyISAM
to InnoDB
in order to get transaction support, and at some earlier point InnoDB
didn't have fulltext capabilities. That's been solved with MySQL 5.6, which came out in 2013: https://www.percona.com/blog/2013/02/26/myisam-vs-innodb-full-text-search-in-mysql-5-6-part-1/.
It's worth noting that the way Elemental worked in the SS3 version was pretty similar to option 2c - we moved away from this to stop duplicating data in the DB (for performance reasons), but it did have the benefit of working natively with both CMS and frontend searches since the denormalised content is still stored in the Content field.
A bit more detail on "Option 2a: Denormalise by "all content" column across tables": Specifically, this concern here is a big one:
Con: No combination with faceted search, unless we specifically copy column data over. This will make it harder to customise ModelAdmin and GridField searches while also retaining fulltext search abilities.
I've also added "Option 2e: Denormalise through database index per table" to distinguish it from "Option 2a".
Here's an example on how Option 2a could work out, which illustrates limitations of that approach:
Page has_many Block GalleryBlock has_many GalleryBlockItem
Page: Created, LastEdited, Title, Content, Description, IsFeatured GalleryBlock: Created, LastEdited, Title, Content GalleryBlockItem: Created, LastEdited, SlideTitle, SlideDescription, Copyright
SearchIndex: ObjectID, ObjectClassName, Created, LastEdited, Title, Content, Page_IsFeatured
Page indexes Created into SearchIndex.Created Page indexes LastEdited into SearchIndex.LastEdited Page indexes IsFeatured into SearchIndex.Page_IsFeatured Page indexes Title into SearchIndex.Title Page indexes Content and Description into SearchIndex.Content GalleryBlock indexes Title into SearchIndex.Title GalleryBlock indexes Content into SearchIndex.Content GalleryBlock doesn't index Created and LastEdited GalleryBlockItem indexes SlideTitle into SearchIndex.Title GalleryBlockItem indexes SlideDescription and Copyright into SearchIndex.Content GalleryBlockItem doesn't index Created and LastEdited
=> Title can be boosted separately from Content, but not per-column or per-record => Built-in faceted search only applies to root object (Created and LastEdited) => Custom faceted search like IsFeatured requires explicit data copy to index
Discussed with @chillu and we decided to leave this as it is for now and consider it again in more detail in future.
cc @be2n
@robbieaverill Can you please get this on your radar again? Might need to split out some PoC for one or two options, to figure out how feasible they are.
cc @brynwhyman @rupachup
This reared its ugly head again in a different shape: link tracking (https://github.com/silverstripe/silverstripe-cms/issues/2276)
We have a couple of content-heavy sites (1000+ pages containing large numbers of elements), that are experiencing this issue. Happy to provide more detail if needed.
Original issue: #163
The method proposed in #188 enables support for this, but using
DataList::filterByCallback
after returning a full set of pages is expensive and will become a problem in larger data sets.We should improve this for a future minor release after this module is 2.0 stable.
In a private message, @tractorcow suggested replicating what
DataQuery::getFinalisedQuery()
does in terms of searching all base element classes for all of their text/searchable fields, then left joining your way back up to the page that owns them.Limitation would be that we set a default/fixed depth so it won't be arbitrary and won't let you search in scalable depth. We could however make this configurable or extensible for devs to customise it to suit their own projects.
Some example pseudo code:
Example code would need to join to the ElementArea that owns the block to be able to get the page ID. We could add an extension point here so as mentioned earlier devs could add more specific joins for deeper nesting into their own project code.