snexus / llm-search

Querying local documents, powered by LLM
MIT License
518 stars 60 forks source link

Best Practices & ###### results problem #77

Closed MyraBaba closed 10 months ago

MyraBaba commented 10 months ago

Hi @snexus

Thanks for your efforts and beatifull project.

1 . Would you mind to give more example for hugginaface llamacpp models ? ie what would be the best accurate result for multilingual or language different from the english ?

2 - What would be the choice for summarization document ?

3 - sometimes result adding many ##### and stopping

4 - Example config for mistral , mixtral and dolphin2

5 Do you know haystack looks commercialized version.

6 what will be the future for enterprise AI search for internal documents ? Is it worth to invest ? Can we talk ?

snexus commented 10 months ago

Hi @MyraBaba

Would you mind to give more example for hugginaface llamacpp models ? ie what would be the best accurate result for multilingual or language different from the english ?

Unfortunately, it is hard to answer this question since it depends on the model and the task. The goal of the package is to enable users to choose the model they like without forcing any specific model, since "best" or "accurate" are subjective. A good place to look for new models or ask for the best model for the specific task would be https://www.reddit.com/r/LocalLLaMA/

What would be the choice for summarization document ?

Summarization is not in the scope of this project - it was built for question-answering.

3 - sometimes result adding many ##### and stopping

This is not coming from the package, but from the model itself - some models might require different prompt templates for asking the question. You can consult the model card e.g. Huggingface and update the prompt template in the config like specified here - https://github.com/snexus/llm-search/blob/d0f756df9fae8ec8786550b0fdcd94c8306f5589/sample_templates/generic/config_template.yaml#L86

4 - Example config for mistral , mixtral and dolphin2

As long as it is supported by llamacpp, it should be similar - download the model (e.g. in gguf format), and specify the path in the config. Some models work better with different default parameters - you can configure it in the config.yaml, similar to here - https://github.com/snexus/llm-search/blob/d0f756df9fae8ec8786550b0fdcd94c8306f5589/sample_templates/generic/config_template.yaml#L97

Do you know haystack looks commercialized version.

Can you clarify the question, please?

what will be the future for enterprise AI search for internal documents ? Is it worth to invest ? Can we talk ?

Quite hard to answer - some big players are entering the space, e.g. Microsoft already offers services for enterprise-grade document question answering, where models can be deployed to an organization's private network and data doesn't leave the perimeter. I think there will be a space for open-source projects like this for small organisations and privacy-conscious users, but hard to tell if it can be monetized.

MyraBaba commented 10 months ago

@snexus https://haystack.deepset.ai/ is the haystack