Closed jirikuncar closed 6 years ago
@lnielsen I did some background reading (and googling) and now I have couple of questions:
Which Elasticsearch 5 version should be used (5.5.x? -- 5.6.x?)
Currently Travis builds are running Ubuntu Trusty Container-based environment, right?
env:
- ES_VERSION=5.1.1 ES_DOWNLOAD_URL=https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-${ES_VERSION}.tar.gz
install:
- wget ${ES_DOWNLOAD_URL}
- tar -xzf elasticsearch-${ES_VERSION}.tar.gz
- ./elasticsearch-${ES_VERSION}/bin/elasticsearch &
script:
- wget -q --waitretry=1 --retry-connrefused -T 10 -O - http://127.0.0.1:9200
Use same code for downloading ES2 and ES5?
Include new envvar ELASTIC_URL:
- ELASTIC_2_URL=https://download.elasticsearch.org/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/${ES_VERSION}/elasticsearch-${ES_VERSION}.tar.gz
- ELASTIC_5_URL=http://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-${ES_VERSION}.tar.gz
...
env: ELASTIC_URL=$ELASTIC_5_URL
wget -O ${ES_URL} | tar xz --directory=/tmp/elasticsearch --strip-components=1
and old ES install+startup method should continue to work.How to propagate .travis.yml
test matrix changes to setup.py
as elasticsearch version changes?
elasticsearch>=2.0.0,<3.0.0
and elasticsearch-dsl>=2.0.0,<3.0.0
.elasticsearch>=5.0.0,<6.0.0
and elasticsearch-dsl>=5.0.0,<6.0.0
.PYTHON_ELASTIC_VERSION = os.environ.get('PYTHON_ELASTIC_VERSION', '>=2.0.0,<3.0.0')
PYTHON_ELASTIC_DSL_VERSION = os.environ.get('PYTHON_ELASTIC_DSL_VERSION', '>=2.0.0,<3.0.0')
install_requires = [
'Flask-BabelEx>=0.9.2',
'dojson>=1.2.0',
'elasticsearch' + PYTHON_ELASTIC_VERSION,
'elasticsearch-dsl' + PYTHON_ELASTIC_DSL_VERSION,
...
]
Making note of Elasticsearch version in Travis logs
- "wget -O ..."
- "elasticsearch --version"
- "/tmp/elasticsearch/bin/elasticsearch &"
Indexing in ES5 vs ES2
How should I setup test matrix so that one set of test uses ES2 and another one ES5?
env:
-key which use ES5 instead of ES2?:
- REQUIREMENTS=lowest EXTRAS=all,sqlite SQLALCHEMY_DATABASE_URI="sqlite:///test.db" ES_HOST=127.0.0.1 ES_VERSION=2.2.0 ELASTIC_URL=$ELASTIC_2_URL
- REQUIREMENTS=release EXTRAS=all,sqlite SQLALCHEMY_DATABASE_URI="sqlite:///test.db" ES_HOST=127.0.0.1 ES_VERSION=2.2.0 ELASTIC_URL=$ELASTIC_2_URL
- REQUIREMENTS=devel EXTRAS=all,sqlite SQLALCHEMY_DATABASE_URI="sqlite:///test.db" ES_HOST=127.0.0.1 ES_VERSION=2.2.0 ELASTIC_URL=$ELASTIC_2_URL
- REQUIREMENTS=lowest EXTRAS=all,sqlite SQLALCHEMY_DATABASE_URI="sqlite:///test.db" ES_HOST=127.0.0.1 ES_VERSION=5.5.3 ELASTIC_URL=$ELASTIC_5_URL
- REQUIREMENTS=release EXTRAS=all,sqlite SQLALCHEMY_DATABASE_URI="sqlite:///test.db" ES_HOST=127.0.0.1 ES_VERSION=5.5.3 ELASTIC_URL=$ELASTIC_5_URL
- REQUIREMENTS=devel EXTRAS=all,sqlite SQLALCHEMY_DATABASE_URI="sqlite:///test.db" ES_HOST=127.0.0.1 ES_VERSION=5.5.3 ELASTIC_URL=$ELASTIC_5_URL
I opened https://github.com/inveniosoftware/invenio-oaiserver/pull/132 so maybe we move discussion to there...
Screenshot from running the official migration tool on a Zenodo ES 2.3 instance:
Some changes to suggesters between ES2 and ES5
_source
will contain the full document, and can be truncated using source filteringOne issue here is that the user-defined payload
object could also contain some extra logic, as done in the invenio-openaire
grants indexing (the legacy_id
field is used by Javascript in the UI). Of course, this extra logic could be either moved to the body of the document (adding a legacy_id
field in the mapping) or to the consumer of the response (Javascript in the UI).
Of course, this extra logic could be either moved to the body of the document (adding a legacy_id field in the mapping)
I guess that's the way to do it, since you would do this logic during indexing anyway, so now you simply write it to a first-class-citizen field instead of the subfield of "suggest"
setup.py