microsoft / rag-experiment-accelerator

The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the process of conducting experiments and evaluations using Azure Cognitive Search and RAG pattern.
https://github.com/microsoft/rag-experiment-accelerator
Other
160 stars 55 forks source link

RAG Pattern for Multi-Lingual Scenarios #7

Open raymond-nassar opened 11 months ago

raymond-nassar commented 11 months ago

Azure Cognitive Search has skillsets for language detection/processing. Exploration will be required to determine the best implementation. Should experiment with German, Italian, and English as languages to be tested, as these are currently being used in an active customer engagement.

### Tasks
- [x] Determine if the default Standard Lucene language analyzer is sufficient
- [x] Determine support guidance for index creation and querying (e.g. blended or language-specific indexes)
- [x] Add LanguageDetectionSkill support
- [x] Create language analyzer settings `analyzers`, `tokenizers`, `token_filters`, etc to `search_config.json` and update SearchIndexClient settings
- [x] Set data ingestion limits/constraints on chunk size to adhere to maximum record size (i.e. 50k characters) - see `search_config.json`.
raymond-nassar commented 9 months ago

Updating the unit test for the language skillset.