scipopt / rubberband

A flexible archiving platform for optimization benchmarks
MIT License
3 stars 0 forks source link

upgrade to current ElasticSearch #63

Open svigerske opened 1 month ago

svigerske commented 1 month ago

I spend some days to get an ElasticSearch 8.15 up and running and to start migrating data from an ElasticSearch 2.x server. But now I noticed that the README here says that Rubberband only works with ElasticSearch 2.x. (yes, I should have seen that earlier).

I see that there is a branch upgrade-elasticsearch that hasn't been received updates for 6 years. @fschloesser Do you remember what the state of this is? What is the difficulty in using a more recent ElasticSearch?

fschloesser commented 3 weeks ago

If I remember correctly, it wasn't a trivial change to migrate the database objects from the old version to the new version. Some fundamental change was introduced after version 2.x of elasticsearch and I didn't find the time to look into this in more depth. The comment that you're quoting applies to the object structure that is currently used in rubberband. What I imagine needs to happen is to write a migrate script that translates the objects from the current database to a new format and pours them into the new database. Also, the way that rubberband interacts with the elasticsearch database probably needs some adjustments.

svigerske commented 3 weeks ago

Just starting ES 8.15 on the data from ES 2.x is indeed not working. Upgrading one major release at a time may work, but also the reindex-from-remote feature of ES seemed promising. I got some migration started with this; I'll just put the script here so I find it later again:

# create the index and set dynamic mapping to runtime to get double as default for numerics
# https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html
# https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic.html
curl -X PUT "localhost:9200/solver-results" -H 'Content-Type: application/json' -d '
{
  "mappings": {
    "dynamic": "runtime"
  }
}'

# say that we have many fields (default is 1000)
curl -X PUT "localhost:9200/_all/_settings?preserve_existing=false" -H 'Content-Type: application/json' -d '{ "index.mapping.total_fields.limit" : "20000" }'

# get data from previous ES server, but tunnel through system outside ZIB because old server is not reachable from new one
# https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#reindex-from-remote
curl -X POST "localhost:9200/_reindex?pretty" -H 'Content-Type: application/json' -d'
{
  "source": {
    "remote": {
      "host": "https://example.gams.com:443",
      "username": "of the auth for the es server",
      "password": "of the auth for the es server"
    },
    "index": "solver-results",
    "size": 1
  },
  "dest": {
    "index": "solver-results"
  }
}
'
# "query": { "range" : { "upload_timestamp": { "gte" : "2020-01-01" } } },

However, there is a size limit of 100MB, which cannot be adjusted (it is not the http.max_content_length), and even with size:1 (that is, do only one document at a time), this process failed eventually. Probably some out file being that large. Also, it was very slow, partly due to setting size:1, partly due to having to tunnel through some machine outside ZIB in order to have the new server reach the old one.
And that was before attempting for Rubberband to talk to ES 8.15.