As discussed we're planning to have 4 Solr-Core-Configurations types:
Smarti application (dynamic / read-write)
Smarti uses Solr for indexing conversational chat-data and makes it searchable. Since this data changes frequently at runtime (Source Rocket.Chat) we need to have the index updated near realtime.
Chatpal application (dynamic / read-write)
Chatpal uses Solr for indexing messages, persons, rooms and attachments. Since this data changes frequently at runtime (Source Rocket.Chat) we need to have the index updated near realtime.
Wikipedia interesting terms (static / read-only)
The Wikipedia index is used to extract interesting terms during indexing Smarti conversations. The Wikipedia crawl does not change often (once a year). So the index can be bundled inside the provided Docker-Image. It does not need to be updated at runtime. Updating this index can happen at deploy time.
Client specific interesting terms (generic / read-import)
The client specific interesting terms are identified from the messages written in Rocket.Chat. These Index must be build in a extendable way, making it easy to add text data, that can be used to identify additional interesting terms. E.g. we are aiming to create interesting terms by using the input from an internal Wiki.
Build & Deploy
The target environment is OpenShift
Infrastructure as code: We need to have all OpenShift configs as yaml-templates
For each of those configurations a separate Docker file must be provided
All configurations types should use the same Zookeeper ensemble
Each configuration must have at least 2 Solr nodes
Each collection must have a replication factor >= 2
Solr-Core-Configuration types
As discussed we're planning to have 4 Solr-Core-Configurations types:
Smarti application (dynamic / read-write)
Smarti uses Solr for indexing conversational chat-data and makes it searchable. Since this data changes frequently at runtime (Source Rocket.Chat) we need to have the index updated near realtime.
Chatpal application (dynamic / read-write)
Chatpal uses Solr for indexing messages, persons, rooms and attachments. Since this data changes frequently at runtime (Source Rocket.Chat) we need to have the index updated near realtime.
Wikipedia interesting terms (static / read-only)
The Wikipedia index is used to extract interesting terms during indexing Smarti conversations. The Wikipedia crawl does not change often (once a year). So the index can be bundled inside the provided Docker-Image. It does not need to be updated at runtime. Updating this index can happen at deploy time.
Client specific interesting terms (generic / read-import)
The client specific interesting terms are identified from the messages written in Rocket.Chat. These Index must be build in a extendable way, making it easy to add text data, that can be used to identify additional interesting terms. E.g. we are aiming to create interesting terms by using the input from an internal Wiki.
Build & Deploy