vespa-engine / pyvespa

Python API for https://vespa.ai, the open big data serving engine
https://pyvespa.readthedocs.io/
Apache License 2.0
103 stars 34 forks source link

Improve pyvespa chunks to docsearch #873

Closed thomasht86 closed 3 months ago

thomasht86 commented 3 months ago

I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.

I suspect that the paragraph_index hasn`t been fed in a while, as running the build script added an extra "-e" to the front matter of the html-files, resulting in empty feed for me.

Improves the chunking to docsearch from pyvespa:

Link to previous test run: https://github.com/vespa-engine/pyvespa/actions/runs/10370895936/job/28709859043 Seem to give better results for reference docs, eg: