meilisearch / scrapix

MIT License
21 stars 9 forks source link

Retrieve page titles from meta tags #53

Open Strift opened 1 year ago

Strift commented 1 year ago

in the results, the page hosted at website.com/docs/examples/foo will be formatted (written in pink) as docs / examples / foo (or docs > examples > foo, depending on which formatting we decided to keep).

Could we pull the page title from the meta tag instead of using the URL ? Some websites might not have proper page titles, but that's on them to fix that. Page titles may be a more reasonable default than the URL itself?

We probably want to avoid having to think about all the weird ways to handle transform URLs into readable text.

bidoubiwa commented 1 year ago

Hey! Thanks for the suggestion :) I was thinking that it could be a configurable option. A boolean that by default is false but if the user wants to use the meta tags (or the title tag) he could specify it.

qdequele commented 1 year ago

Hello @Strift, are you using docsearch version or default version?

On the default version, the title is already taken from the metadata. You also have a field urls_tags that is an array of strings containing all the elem in the pathname.

Strift commented 1 year ago

My comment was based on a preview @bidoubiwa shared with me 3 weeks ago. She asked me to create an issue about it. I don't know which version it is, and can't say if the issue is still relevant as of today.