Closed fsteeg closed 8 years ago
Will update Elasticsearch on quaoar1, which is currently only used for the geodata index.
Started creation of a new geodata index on the quaoar cluster (quaoar1 and 2).
(lod@gaia:~/git/geodata-staging on port 7401, used by sol@quaoar1:~/git/lobid-organisations-staging)
quaoar cluster (quaoar1 and 2).
You meant quaoar2
and quaoar3
and did the things accordingly :)
You meant quaoar2 and quaoar3 and did the things accordingly :)
Oh right, sorry for the confusion.
New index on quaoar cluster is done, was used for creating current staging data for lobid-organisations (see http://test.lobid.org/organisations/search). Switched lod@gaia:~/git/geodata
to use the new index.
Elasticsearch instance on quaoar1 is now no longer used. Will update Elasticsearch on quaoar1 next.
Replaced the old Elasticsearch on quaoar1 with Elasticsearch 2.3.3, installed via sudo dpkg -i elasticsearch-2.3.3.deb
(from https://www.elastic.co/downloads/elasticsearch) in /usr/share/elasticsearch/
, conf in /etc/elasticsearch/
, logs in /var/log/elasticsearch/
, and installed the Elastic HQ plugin (an alternative to the head plugin, see http://elastichq.org):
http://quaoar1.hbz-nrw.de:9200/ & http://quaoar1.hbz-nrw.de:9200/_plugin/hq/
Started indexing with https://github.com/hbz/mabxml-elasticsearch/commit/8723a3fc6a6bb5293f8eae771203a814d04c5ed7 in sol@quaoar1:~/git/mabxml-elasticsearch-staging
with:
mvn clean install ; mvn exec:java -Dexec.mainClass="flow.Transform" -Dexec.args="/files/open_data/open/DE-605/mabxml/ gz quaoar1 193.30.112.170 hbz01-20160630-1315" > log/processMabxml.sh.20160630-1315.log 2>&1 &
New index was created. Created alias hbz01-test
for new index.
Copied updates since index creation from /files/open_data/open/DE-605/mabxml/
to sol@quaoar1:~/git/mabxml-elasticsearch-staging/updates
:
sol@quaoar1:~/git/mabxml-elasticsearch-staging/updates$ ls
DE-605-aleph-update-marcxchange-20160630-20160701.tar.gz
DE-605-aleph-update-marcxchange-20160703-20160704.tar.gz
DE-605-aleph-update-marcxchange-20160702-20160703.tar.gz
Indexed these updates with:
mvn exec:java -Dexec.mainClass="flow.Transform" -Dexec.args="/home/sol/git/mabxml-elasticsearch-staging/updates gz quaoar1 193.30.112.170 hbz01-test" > log/processMabxml.sh.20160704-0930.log 2>&1 &
Set up separate updates test transformation at hduser@weywot2
and added crontab entry:
20 09 * * * ssh hduser@weywot2 "cd ~/git/mabxml-elasticsearch-test/src/main/resources ; bash transform.sh"
Configured in src/main/resources/transform.sh
to update the new index using the hbz01-test
alias, used by http://test.lobid.org/hbz01:
#!/bin/bash
set -euo pipefail # See http://redsymbol.net/articles/unofficial-bash-strict-mode/
IFS=$'\n\t'
# Execute via crontab by hduser@weywot1:
# 20 09 * * * ssh hduser@weywot2 "cd ~/git/mabxml-elasticsearch-test/src/main/resources ; bash transform.sh"
export MAVEN_OPTS="-Dfile.encoding=UTF-8 -Xmx1024M -Xss128M -XX:+CMSClassUnloadingEnabled"
# Determine the latest update file and store it locally:
updates=http://dataproxy.lobid.org/alephxml/export/update/
date=$(date "+%Y%m%d")
updateFile=$(curl $updates | grep 'tar.gz' | cut -d '"' -f2 | grep $date)
cd updates ; wget $updates$updateFile ; cd ../../../..
# Run the transformation with the latest file (and possibly unprocessed previous files):
mvn clean install >> log/processMabxml.sh.$date.log 2>&1
mvn exec:java -Dexec.mainClass="flow.Transform" -Dexec.args="src/main/resources/updates/ gz quaoar1 193.30.112.170 hbz01-test" >> log/processMabxml.sh.$date$
# Clean up and move updates to the full data directory (skipped if transformation fails, due to -e option):
cd src/main/resources/
# cp updates/* /files/open_data/open/DE-605/mabxml/
rm updates/*
Triggered creation of second index (for separate prod and test indexes) for data:
sol@quaoar1:~/git/mabxml-elasticsearch-staging$ ls /files/open_data/open/DE-605/mabxml/
DE-605-aleph-baseline-marcxchange-2016062414.tar.gz DE-605-aleph-update-marcxchange-20160629-20160630.tar.gz
DE-605-aleph-update-marcxchange-20160625-20160626.tar.gz DE-605-aleph-update-marcxchange-20160630-20160701.tar.gz
DE-605-aleph-update-marcxchange-20160626-20160627.tar.gz DE-605-aleph-update-marcxchange-20160702-20160703.tar.gz
DE-605-aleph-update-marcxchange-20160627-20160628.tar.gz DE-605-aleph-update-marcxchange-20160703-20160704.tar.gz
DE-605-aleph-update-marcxchange-20160628-20160629.tar.gz README.htm
With https://github.com/hbz/mabxml-elasticsearch/commit/8723a3fc6a6bb5293f8eae771203a814d04c5ed7 in sol@quaoar1:~/git/mabxml-elasticsearch-staging
:
mvn clean install ; mvn exec:java -Dexec.mainClass="flow.Transform" -Dexec.args="/files/open_data/open/DE-605/mabxml/ gz quaoar1 193.30.112.170 hbz01-20160704-1030" >> log/processMabxml.sh.20160704-1030.log 2>&1 &
Deployed to test system: http://test.lobid.org/hbz01
Does not work: "Alias [hbz01-test] has more than one indices associated with it"
Thanks for testing! I had just added an hbz01-test
alias to the new, second index today. My idea was to keep both up to date via the new crontab entry, but obviously that doesn't work. I've set up a hbz01
alias for the new index instead, and manually triggered todays's updates for both. I disabled the deletion of the updates in sol@quaoar1:~/git/mabxml-elasticsearch-staging
and will index them manually into hbz01
when deploying to production (after functional and code review).
+1
See https://github.com/hbz/lobid/issues/300