Closed jankounchained closed 6 months ago
@x-tabdeveloping reworked the encoder, check it out
there's a new problem with documentation: newline characters break the code blocks ugh
what is should be:
Examples
--------
Instructional models can also be used.
In this case, the documents should be prefixed with a one-sentence instruction that describes the task.
See Notes for available models and instruction suggestions.
from turftopic.encoders import E5Encoder
def add_instruct_prefix(document: str) -> str:
task_description = "YOUR_INSTRUCTION"
return f'Instruct: {task_description}\nQuery: {document}'
encoder = E5Encoder(model_name="intfloat/multilingual-e5-large-instruct", preprocessor=add_instruct_prefix)
model = GMM(10, encoder=encoder)
```
Or the same can be done using a `prefix` argument:
```python
from turftopic.encoders import E5Encoder
from turftopic import GMM
prefix = "Instruct: YOUR_INSTRUCTION\nQuery: "
encoder = E5Encoder(model_name="intfloat/multilingual-e5-large-instruct", prefix=prefix)
model = GMM(10, encoder=encoder)
```
what it is:
![Screenshot 2024-03-12 at 18 40 25](https://github.com/x-tabdeveloping/turftopic/assets/42962106/da0bbe57-aab8-4029-a721-cbcf947d27a9)
to be fixed
Site won't be rebuilt, code commited in another pull request
Fixes:
15
Two simple E5 encoders were implemented:
E5Encoder
andE5InstructionalEncoder
. The only difference is what prefix they give to documents. Documentation is updated.16
Doc dependencies listed in
pyproject.toml
17
Temporary fix: commented L45 out in
mkdocs.yml
(the custom_templates line) Site looks ok.