meilisearch / scrapix

MIT License
21 stars 9 forks source link

Improve splitting of documents #51

Closed bidoubiwa closed 1 year ago

bidoubiwa commented 1 year ago

Currently scrapix split the page in a format that is not the most intuitive when using the search bar.

For example lets imagine this page:

Screenshot 2023-06-29 at 16 50 03

scrapix in its current usage is going to create two documents:

Screenshot 2023-06-29 at 16 50 06

This results in the following search experience:

Screenshot 2023-06-26 at 17 48 48

All the content of h1, h2, and h3 are concatenated together.

That splitting should improve in order to have