plone / plone.app.blocks

A 'blocks' rendering model for Plone
https://pypi.python.org/pypi/plone.app.blocks
Other
11 stars 4 forks source link

Indexing the tile content using collective.dexteritytextindexer #25

Open neilferreira opened 8 years ago

neilferreira commented 8 years ago

Hi all

I have numerous mosaic pages that I need to have indexed for my site search. I've been following the trail and noticed that the layout content can be indexed via this package from following this trail by @datakurre : https://github.com/plone/plone.app.mosaic/issues/54 https://github.com/plone/plone.app.blocks/commit/d6cbc0a8c1689336332b61d376b3e7f205a68de8

I have followed the steps outlined in the comment on the mosaic issue, but am now getting these values indexed in my "SearchableText" index

SearchableText  ['home', 'summary', 'doctype', 'html', 'public', 'w3c', 'dtd', 'xhtml', u'1', u'0', 'transitional', 'en', 'http', 'www', 'w3', 'org', 'tr', 'xhtml1', 'dtd', 'xhtml1', 'transitional', 

As you can see, the full HTML is being indexed, rather than a plaintext version of the tiles being indexed. It also has the issue where the <div data-tile objects are being indexed, instead of a rendered version of them. Meaning you won't actually have the Persistent tiles's content being indexed.

A snippet of the 'value' being indexed in https://github.com/collective/collective.dexteritytextindexer/blob/master/collective/dexteritytextindexer/indexer.py#L83

 <div class="mosaic-tile-content">\r\n          <div data-tile="./@@site.tiles.content_pull/6b2a450c262b4d3f85e8dfafb3a3fc78?_layouteditor=true&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;"></div>\r\n          </div>\r\n          </div>\r\n          <div class="movable removable mosaic-tile mosaic-site.tiles.text-tile">\r\n          <div class="mosaic-tile-content">\r\n          <div data-tile="./@@site.tiles.text/7331d4b63c7b47f5b8da87db46265a86?_layouteditor=true&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;"></div>\r\n          </div>\r\n          </div>\r\n          <div class="movable removable mosaic-tile mosaic-site.tiles.simple_listing-tile">\r\n          <div class="mosaic-tile-content">\r\n          <div data-tile="./@@site.tiles.simple_listing/0d979164a36949e2a2b8d1742155d985

Is this the planned behaviour? If I added a tile onto my page such as a "Heading" tile with the content "Hello world" I would presume my SearchableText to become ['hello', 'world']

Any pointers to resolving this? Is my expectation the same as you intended for this to work? OR should I be looking implementing this myself?

Thankyou.

datakurre commented 8 years ago

@neilferreira Good points and relevant issues in the current implementation.

Now when you said it, it could be possible to also to render the final HTML, run that through portal transforms and index it in c.dxtextindexer adapter:

  1. Look into collective.dexteritysearchindexer README on how to register custom indexer adapter for layout behavior
  2. Look into "existing content tile" for how to force panel and tile interpolation manually to get the final HTML: https://github.com/plone/plone.app.standardtiles/blob/master/plone/app/standardtiles/existingcontent.py#L74
  3. Look into collective.dexteritytextindexer for an example, how to use portal_transforms to convert html to text https://github.com/collective/collective.dexteritytextindexer/blob/master/collective/dexteritytextindexer/converters.py#L44
neilferreira commented 8 years ago

For record keeping

This will be fixed by finalising https://github.com/plone/plone.app.blocks/pull/21

https://github.com/plone/plone.app.mosaic/issues/54 is "closed" but not fixed

gforcada commented 8 years ago

Aha, that's the one that I thought on https://github.com/plone/plone.app.blocks/pull/21 :-)

So @neilferreira could you give us an update on your CLA status?

neilferreira commented 8 years ago

CLA has been signed. I'm also part of the Plone foundation on Github.

jensens commented 8 years ago

21 was merged

neilferreira commented 8 years ago

@datakurre Did you want me to commit this code anywhere to have it merged in? https://github.com/neilferreira/plone.app.blocks/commit/57b40acbb29a84f1ca051ea16a5767160abbd237

It seems to have gotten lost in between #21 and this issue if I'm not mistaken

datakurre commented 8 years ago

Please, could you do a new pull. I probably thought that Nathan's indexing pull covered this, but it was still missing portal transforms call to clean html tags.

Btw, does it make sense to index external content tiles? (Code which resolves relation value to get title). I'd also like to keep relationvalue optional dependency, similar to dxtextindexer.

On 19. huhtikuuta 2016 klo 8.55 +0300, Neil Ferreiranotifications@github.com, wrote:

@datakurre(https://github.com/datakurre)Did you want me to commit this code anywhere to have it merged in?neilferreira@57b40ac(https://github.com/neilferreira/plone.app.blocks/commit/57b40acbb29a84f1ca051ea16a5767160abbd237)

It seems to have gotten lost in between#21(https://github.com/plone/plone.app.blocks/pull/21)and this issue if I'm not mistaken

— You are receiving this because you were mentioned. Reply to this email directly orview it on GitHub(https://github.com/plone/plone.app.blocks/issues/25#issuecomment-211740845)

datakurre commented 8 years ago

I believe, this is implemented.

neilferreira commented 8 years ago

Apologies for re-opening this for a second time but I believe I'll be re-visiting this on a project quite shortly, and will likely give it some attention then.

The current implementation still indexes the object's id in the SearchableText (unsure why) and makes the assumption that only title, label and content will be indexed on the tiles. If you use RichText fields in your tiles (which we do all the time) these are not indexed at all, making the indexing quite useless.

Todo list, which most of which is already done on https://github.com/neilferreira/plone.app.blocks/commit/57b40acbb29a84f1ca051ea16a5767160abbd237

neilferreira commented 8 years ago

I actually really like what @hvelarde has created on https://github.com/collective/collective.cover/pull/556/files so now I'm almost hesitant to be automatically doing any sort of schema indexing, and rather leaving it to the developer to specify an indexer for their tile themselves.

@datakurre now that you've seen these two approaches how did you want to approach this?

datakurre commented 8 years ago

Enhancing the current indexers with adapter support would be nice, but instead of inventing new interfaces, I'd reuse collective.dexteritytextindexers interfaces.

On 4. lokakuuta 2016 klo 4.35 +0300, Neil Ferreira notifications@github.com, wrote:

I actually really like what @hvelarde (https://github.com/hvelarde) has created on https://github.com/collective/collective.cover/pull/556/files so now I'm almost hesitant to be automatically doing any sort of schema indexing, and rather leaving it to the developer to specify an indexer for their tile themselves.

@datakurre (https://github.com/datakurre) now that you've seen these two approaches how did you want to approach this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub (https://github.com/plone/plone.app.blocks/issues/25#issuecomment-251273851), or mute the thread (https://github.com/notifications/unsubscribe-auth/AAJyv8IW_UKLYk0XRu-ur64r2J7Hk2oIks5qwa1YgaJpZM4Hjcn-).

jensens commented 8 years ago

Btw., if someone takes the time to do the implementation and a PLIP to merge collective.dexteritytextindexers in plone.app.dexterity this would be awesome.