querqy / smui

Search Management UI
Apache License 2.0
52 stars 24 forks source link

Refactor & test deployment configuration (e.g. RulesTxtDeploymentService) for Elasticsearch support #56

Open pbartusch opened 3 years ago

pbartusch commented 3 years ago

Deployment possibilities for SMUI have grown rapidly. The configuration is hard to understand & corresponding code is hard to maintain - this includes:

especially.

Approach:

Step#1: document all deployment possibilities, that should be supported by SMUI (already take future Elasticsearch support , #43 , into account). Step#2: derive a config schema (for application.conf). Step#3: refactor the code (breaking change)

pbartusch commented 3 years ago

The major goal of this story is to:

Constraints:

SMUI's deployment options (plan):

The following refactoring steps are suggested in order to sustain maintainability for SMUI with respect to the deployment options:

Explicit deployment configuration:

smui.deployment.PRELIVE = {
  'procedure': 'conf/deployment/git-repository.sh',
  'params': {
    'repo': 'https+ssh://my-repo-on.domain.tld'
    ...
  }
}
{SMUI_DEPLOYMENT_PROCEDURE}.sh {DEPLOYMENT_INSTANCE} {RULES_COLLECTION_NAME} {EXPORT_PATH} {RULES_TXT_FILE(S) as ordered comma separated list} {PROCEDURE_SPECIFIC_PARAMS as --key=value}

e.g.:
git-repository.sh PRELIVE ecommerce /export common-rules.txt,decompound-rules.txt,spelling-rules.txt --repo=https+ssh://my-repo-on.domain.tld ...
smui.deployment.PRELIVE = {
  'procedure': 'services.deployment.ElasticsearchDeployment',
  'params': {
    'url': 'https://my-elasticsearch-instance-on.domain.tld'
    ...
  }
}

Note: As time of planning this major change, SMUI refactorings (splitting frontend & backend implementation) take place. The following branches are relevant:

epugh commented 3 years ago

I'm planning on removing the jackhanna script in favour of the single upload capability for ConfigSets, which should probably be how the zk-solr-cloud.sh interacts with Solr! Maybe rename it to solr-cloud.sh? See https://github.com/querqy/chorus/issues/22.

renekrie commented 3 years ago

@epugh @pbartusch Please keep in mind that https://github.com/querqy/querqy/issues/76 will be a breaking change: the rules.txt as no longer be deployed as such but the rules will be embedded into a JSON HTTP request (very similar to Querqy for ES). Also, the direct interaction with ZK or any direct interaction with the configset will be removed (and the collection reload as well).

It is very likely, that we can test a release candidate in production as soon as January. I think we need this kind of 'beta version' this time given the scope of the change.

Long story short: please do not invest any time into making the current deployment of rules.txt to Solr better - it will be replaced very soon.

pbartusch commented 3 years ago

@renekrie , thanks for the hint.

Long story short: please do not invest any time into making the current deployment of rules.txt to Solr better - it will be replaced very soon.

that is not the plan. the focus of the concept described above lies on different deployment options in general.

renekrie commented 3 years ago

that is not the plan. the focus of the concept described above lies on different deployment options in general.

@pbartusch I was a bit worried because earlier you said:

Chorus should be adjusted to the newly adopted zk-solr-cloud deployment procedure as a first proof of concept.

I assume that zk-solr-cloud deployment will become outdated very soon.

pbartusch commented 3 years ago

ah. got it. ok , it wasnt ment to the be the focus, but I understand the concern. Thanks , @renekrie .

Then it seems better to make the smui2solrcloud.sh a proof of concept for a custom deployment procedure. I will adjust https://github.com/querqy/smui/issues/56#issuecomment-745107324 accordingly.

pbartusch commented 3 years ago

@epugh , now I got your point as well. Regarding:

[...] in favour of the single upload capability for ConfigSets, which should probably be how the zk-solr-cloud.sh interacts with Solr

I suggest to add this deployment procedure (once its available in Solr/Querqy) to SMUI instead of Chorus as the solr-cloud.sh you suggested.

I will not make this part of this issue/story (obviously ;-)), but we should develop it within the scope of SMUI and adjust Chorus accordingly.

pbartusch commented 3 years ago

@renekrie , will there stay the solr-local deployment procedure possibility in Solr? (meaning: cp the rules.txt and then perform a core reload)

Or will that be deprecated as well?

renekrie commented 3 years ago

This will be the same HTTP call like for SolrCloud

renekrie commented 3 years ago

Just a heads-up: I've just merged a PR for https://github.com/querqy/querqy/issues/116 to querqy-core.

This would give you the option to manage ES/Solr specifics via templates in the rules file. For example, a down boost on a field could look like this:

notebook =>
  UP(10): asus
  << field_down: factor=20 || fieldname=category || value=accessories >>

At the beginning of the file, you would have to prepend the search-engine-specific template:

# either Solr:
def field_down(factor, fieldname, value):
  DOWN($factor): * $fieldname:(value)

# or Elasticsearch:
def field_down(factor, fieldname, value):
  DOWN($factor): *  "match": { "$fieldname": { "query": "$value" }}

If it helps, we could probably add docstring documentation to the templates à la:

def field_down(factor, fieldname, value):
"""Use this to penalise documents that contain a certain value in the specified field.

  :param factor: the penalisation factor
  :param fieldname: the field name
  :param value: the field value
  :type factor: float
  :type fieldname: string
  :type value: string
"""
DOWN($factor): * $fieldname:(value)

This would probably enable SMUI to generate a form input in the UI from the template. At the most advanced end, we could let users create and manage their own templates in SMUI, including for more complex function queries.

dobestler commented 3 years ago

Do you think it might be useful to have the ability to define a raw query to a rule as well (i.e. everything after the '*')? E.g. as a specific option in the UI instead of choosing from suggested fields and putting a field for a value. The advantage would be to enable basically all use cases for rules through SMUI. It could enable Elastic Rules completely as a first step and circumvent the templates discussion and similar approaches. Tradeoff being the higher risk of human error when writing raw query syntax unless there is validation added to these inputs.

Update: It seems to be already possible through toggle.ui-concept.all-rules.with-solr-fields=false which renders the Term as is and does not throw any validation errors. So import, UI edit, export seems to be all working with Elastic Rules.

Paul-Blanchaert commented 3 years ago

@pbartusch Is there some activity planned on this issue? While refactoring, could the concept of SOLR_BASE_URL (e.g. http://localhost:8983/solr) versus SOLR_HOST (that then gets hardcoded build to the SOLR_BASE_URL). The advantage of the SOLR_BASE_URL would be that it will enable the customer to use http and https (and a possible different application root replacing "/solr").

epugh commented 3 years ago

See #82 which is specific to @pbartusch comment back in December 2020!