offene tasks: SOLR7 produktiv

guenterh commented 6 years ago

[x] Does ist make sense to use the new possibilty of different Nodetypes in SOLR7. Plus: we could use the old and well established repication mode. Minus : see discussion of Ere in Solr list and response from Eric

because we will run with two parallel clusters for green as well as for bb we can run a software update sequentially

currently there is a additional activity for SOLR 7.4 (??) where Ere tries to implement a mechanism which enables to activate dedicated node replica types for each server node

I think we can run at first with the classic model and then later switch to a more sophisticated solution

[ ] more tests based on VuFind 4
sort - ok!
lack of functions in VuFind 4.1 - compare the new branch in VuFind https://github.com/vufind-org/vufind/tree/solr7

I have to analyze test results of Silvia

[x] strategies for software updates and frequent re-creation of Indices after CBS export
we need kinid of a cookbook to run software updates on custers. In master / slave model this was relatively easy to be done
do we need some kind of staging cluster for switching (relatively expensive)

we can do it by running a two cluster solution for every collection see above

[x] numer of nodes / servers and replicas in production mode
for green (at the moment for nodes with 4 shards)
for BB (smaller as green, at first make your experiences with green)
do we need a dedicated collection on dedicated cluster for basel/Bern? )(I think so)
[ ] strategy / cookbook: workflows after fresh export from CBS
growth of indices in regular production mode (we have a lot of deletes and updates every day)
handling of several collections on the same cluster and switching using aliases (we do this with Elasticsearch for linked - so far not tested with SOLR)
[ ] Picibird has to test version 7
[ ] block free http://search.swissbib.ch search in with search.swissbib.ch in the future: reason: problems with deep paging in distributed mode (special clients like Picibird should be allowed via IP)
[x] at least two independent zookeeper ensembles (one for Kafka and and for SOLR) - Probably it makes sense to install independent ensembles for our two collections (green and basel / bern). Reason: As I see it now thew will be hosted on their own clusters with dededicated hosts. I guess independent ensembles makes it easier for daily operations
[ ] by now some deprecated log warnings appear for older components in use
[x] establish test and development index for SOLR 7 (sb-us11, sb-us12, sb-us13)
[x] are there any configuration changes between 7.1 and 7.2 we have to condider
[x] prepare SHI mail - and discuss these topics
sort with docvalues (implemented and works well)
analyzing of metrics in the SOLR backend (I'm not really experienced with the possibilities given)
implementation of different node types with SOLR 7 (reference to ERE summary)
[x] ELK for SOLR? - by now, running in production mode with our new ELK stack, we could collect all the SOLR logs in this cluster, (better performance analyzis, anlyzing of used search queries for LTR mechanisms for example - although we should have part of these terms coming from the VuFind logs, additionally field: bookshelf
not necessary for going live with SOLR7
[ ] clean up ZIM
[ ] SRU interface has to work with the new cluster
[ ] search.swissbib.ch has to be blocked for worldwide access

guenterh commented 6 years ago

aktuelle performancetests (7.2 läuft fortdauernd - momentane server-Infrastruktur)

SOLR 7.1 (im Moment abgeschlossen) http://www.swissbib.org/doc/solr/filesummary.0.txt http://www.swissbib.org/doc/solr/summary.0.txt
SOLR 7.2 http://www.swissbib.org/doc/solr/filesummary.txt http://www.swissbib.org/doc/solr/summary.txt
queries für filesummare http://www.swissbib.org/doc/solr/filebasedqueries.txt

swissbib-unibas commented 6 years ago

33

guenterh commented 6 years ago

auf einzelne tasks aufgeteilt

swissbib / searchconf

offene tasks: SOLR7 produktiv #33