Closed Neikius closed 6 years ago
the searchguard initialization is part of the run.sh
where it calls another script es_seed_acl
which calls the sgadmin.sh
in a loop and checks if the searchguard really was initialized.
For start, could you provide the output of _cat/indices?v
and _cluster/health?level=indices
to see if the RED state is due to some index being in RED state? Easy way to get this might be
oc rsh -c elasticsearch [es_pod_name]
cd /etc/elasticsearch/secret
curl -4 -s -XGET --cacert ./path_to_cert/admin-ca --cert ./admin-cert --key ./admin-key https://localhost:9200/_cat/indices?v --insecure
curl -4 -s -XGET --cacert ./path_to_cert/admin-ca --cert ./admin-cert --key ./admin-key https://localhost:9200/_cluster/health?level=indices?v --insecure
Maybe @lukas-vlcek can help here or provide some additional info how to debug
I get this:
sh-4.2$ curl -4 -s -XGET --cacert ./path_to_cert/admin-ca --cert ./admin-cert --key ./admin-key https://localhost:9200/_cat/indices?v --insecure
health status index pri rep docs.count docs.deleted store.size pri.store.size
red open .searchguard.logging-es-data-master-rx1j1syi 1 0
sh-4.2$ curl -4 -s -XGET --cacert ./path_to_cert/admin-ca --cert ./admin-cert --key ./admin-key https://localhost:9200/_cluster/health?level=indices?v --insecure
{"cluster_name":"logging-es","status":"red","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":0,"active_shards":0,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":1,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":0.0}
Do I understand this correctly, the index is broken? This is a fresh install, it never worked in the first place. Can I just delete that index somehow?
looks like the index never got allocated its primary shard, deleting it may help but before you do so, could you please provide few more things to try figure out why?
oc rsh -c elasticsearch [es_pod_name]
# see if the PV still has some space, there should be a mount point /elasticsearch/persistent/
df -h
# copy elasticsearch internal logs to see if they could help
/elasticsearch/logging-es/logs/logging-es*
Sure, no problem, just tell what you need, hopefully I can help you. I tried uninstall/install again today since I can see latest tag is updated regularly and I can see that latest tag is taken in my deploy. Still the same result so I am providing the info:
oc rsh -c elasticsearch logging-es-data-master-lxllv5mv-1-jfjb7
sh-4.2$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/docker-253:0-7878-2aaf6745a6113eac166d537dcd5eaaa98b395536951cc3682dd03227705698b3 10G 728M 9.3G 8% /
tmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/mapper/centos-root 44G 3.4G 41G 8% /etc/hosts
x.x.x.x:vol_6f27670fa3dcb9bb84d8cd98902faece 10G 34M 10G 1% /elasticsearch/persistent
shm 64M 0 64M 0% /dev/shm
tmpfs 7.8G 32K 7.8G 1% /etc/elasticsearch/secret
tmpfs 7.8G 16K 7.8G 1% /run/secrets/kubernetes.io/serviceaccount
logging-es.log:
[2018-01-11 07:43:30,159][INFO ][node ] [logging-es-data-master-lxllv5mv] version[2.4.4], pid[1], build[fcbb46d/2017-01-03T11:33:16Z]
[2018-01-11 07:43:30,161][INFO ][node ] [logging-es-data-master-lxllv5mv] initializing ...
[2018-01-11 07:43:30,791][INFO ][plugin.prometheus ] starting Prometheus exporter plugin...
[2018-01-11 07:43:31,019][INFO ][plugins ] [logging-es-data-master-lxllv5mv] modules [reindex, lang-expression, lang-groovy], plugins [prometheus-exporter, openshift-elasticsearch, cloud-kubernetes], sites []
[2018-01-11 07:43:31,055][INFO ][env ] [logging-es-data-master-lxllv5mv] using [1] data paths, mounts [[/elasticsearch/persistent (x.x.x.x:vol_6f27670fa3dcb9bb84d8cd98902faece)]], net usable_space [9.9gb], net total_space [9.9gb], spins? [possibl
y], types [fuse.glusterfs]
[2018-01-11 07:43:31,055][INFO ][env ] [logging-es-data-master-lxllv5mv] heap size [3.9gb], compressed ordinary object pointers [true]
[2018-01-11 07:43:31,548][INFO ][http ] [logging-es-data-master-lxllv5mv] Using [org.elasticsearch.http.netty.NettyHttpServerTransport] as http transport, overridden by [search-guard2]
[2018-01-11 07:43:31,714][INFO ][transport ] [logging-es-data-master-lxllv5mv] Using [com.floragunn.searchguard.transport.SearchGuardTransportService] as transport service, overridden by [search-guard2]
[2018-01-11 07:43:31,714][INFO ][transport ] [logging-es-data-master-lxllv5mv] Using [com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] as transport, overridden by [search-guard-ssl]
[2018-01-11 07:43:33,660][INFO ][io.fabric8.elasticsearch.plugin.PluginSettings] Using kibanaIndexMode: 'unique'
[2018-01-11 07:43:33,660][INFO ][io.fabric8.elasticsearch.plugin.PluginSettings] Using kibanaIndexMode: 'unique'
[2018-01-11 07:43:33,661][INFO ][io.fabric8.elasticsearch.plugin.kibana.IndexMappingLoader] Trying to load Kibana mapping for io.fabric8.elasticsearch.kibana.mapping.app from plugin: /usr/share/elasticsearch/index_patterns/com.redhat.viaq-openshift.index-pattern.json
[2018-01-11 07:43:33,665][INFO ][io.fabric8.elasticsearch.plugin.kibana.IndexMappingLoader] Trying to load Kibana mapping for io.fabric8.elasticsearch.kibana.mapping.ops from plugin: /usr/share/elasticsearch/index_patterns/com.redhat.viaq-openshift.index-pattern.json
[2018-01-11 07:43:33,665][INFO ][io.fabric8.elasticsearch.plugin.kibana.IndexMappingLoader] Trying to load Kibana mapping for io.fabric8.elasticsearch.kibana.mapping.empty from plugin: /usr/share/elasticsearch/index_patterns/com.redhat.viaq-openshift.index-pattern.json
[2018-01-11 07:43:33,666][INFO ][io.fabric8.elasticsearch.plugin.OpenshiftRequestContextFactory] Using kibanaIndexMode: 'unique'
[2018-01-11 07:43:33,666][INFO ][io.fabric8.elasticsearch.plugin.PluginSettings] Using kibanaIndexMode: 'unique'
[2018-01-11 07:43:33,792][INFO ][node ] [logging-es-data-master-lxllv5mv] initialized
[2018-01-11 07:43:33,792][INFO ][node ] [logging-es-data-master-lxllv5mv] starting ...
[2018-01-11 07:43:33,903][INFO ][discovery ] [logging-es-data-master-lxllv5mv] logging-es/WxQZaK2FSWOHam5yTx4sig
[2018-01-11 07:43:37,938][INFO ][cluster.service ] [logging-es-data-master-lxllv5mv] new_master {logging-es-data-master-lxllv5mv}{WxQZaK2FSWOHam5yTx4sig}{10.129.0.67}{10.129.0.67:9300}{max_local_storage_nodes=1, master=true}, reason: zen-disco-join(elected_as_ma
ster, [0] joins received)
[2018-01-11 07:43:37,962][INFO ][http ] [logging-es-data-master-lxllv5mv] publish_address {10.129.0.67:9200}, bound_addresses {[::]:9200}
[2018-01-11 07:43:37,963][INFO ][node ] [logging-es-data-master-lxllv5mv] started
[2018-01-11 07:43:38,053][INFO ][gateway ] [logging-es-data-master-lxllv5mv] recovered [1] indices into cluster_state
[2018-01-11 07:43:42,420][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized (you may need to run sgadmin)
[2018-01-11 07:43:42,477][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized (you may need to run sgadmin)
[2018-01-11 07:43:42,479][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized (you may need to run sgadmin)
[2018-01-11 07:43:42,657][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized (you may need to run sgadmin)
[2018-01-11 07:43:42,685][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized (you may need to run sgadmin)
[2018-01-11 07:43:42,686][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized (you may need to run sgadmin)
[2018-01-11 07:43:42,706][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized (you may need to run sgadmin)
[2018-01-11 07:43:42,727][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized (you may need to run sgadmin)
[2018-01-11 07:43:42,747][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized (you may need to run sgadmin)
[2018-01-11 07:43:42,842][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized (you may need to run sgadmin)
[2018-01-11 07:43:42,861][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized (you may need to run sgadmin)
[2018-01-11 07:43:42,962][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized (you may need to run sgadmin)
... this just goes on forever ...
All other log files are empty. Maybe oc logs too ? Here:
# oc logs logging-es-data-master-lxllv5mv-1-jfjb7 -c elasticsearch
[2018-01-11 07:43:29,084][INFO ][container.run ] Begin Elasticsearch startup script
[2018-01-11 07:43:29,094][INFO ][container.run ] Comparing the specified RAM to the maximum recommended for Elasticsearch...
[2018-01-11 07:43:29,095][INFO ][container.run ] Inspecting the maximum RAM available...
[2018-01-11 07:43:29,098][INFO ][container.run ] ES_HEAP_SIZE: '4096m'
[2018-01-11 07:43:29,101][INFO ][container.run ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof
[2018-01-11 07:43:29,104][INFO ][container.run ] Checking if Elasticsearch is ready on https://localhost:9200
[2018-01-11 07:43:38,600][INFO ][container.run ] Elasticsearch is ready and listening at https://localhost:9200
[2018-01-11 07:43:38,610][INFO ][container.run ] Seeding the searchguard ACL index. Will wait up to 604800 seconds.
/usr/share/java/elasticsearch/config
Will connect to localhost:9300 ... done
2018-01-11 07:43:39 INFO SearchGuardSSLPlugin:84 - Search Guard 2 plugin not available
2018-01-11 07:43:39 INFO SearchGuardPlugin:58 - Clustername: elasticsearch
2018-01-11 07:43:39 INFO SearchGuardPlugin:70 - Node [null] is a transportClient: true/tribeNode: false/tribeNodeClient: false
2018-01-11 07:43:39 INFO plugins:180 - [Conquer Lord] modules [], plugins [search-guard-ssl, search-guard2], sites []
2018-01-11 07:43:39 INFO DefaultSearchGuardKeyStore:423 - Open SSL not available (this is not an error, we simply fallback to built-in JDK SSL) because of java.lang.ClassNotFoundException: org.apache.tomcat.jni.SSL
2018-01-11 07:43:39 INFO DefaultSearchGuardKeyStore:173 - Config directory is /usr/share/java/elasticsearch/config/, from there the key- and truststore files are resolved relatively
2018-01-11 07:43:39 INFO DefaultSearchGuardKeyStore:142 - sslTransportClientProvider:JDK with ciphers [TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384, TLS_DHE_RSA_WITH_AES_256_CBC_SHA256, TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, TLS_DHE_RSA_WITH_AES_256_CBC_SHA, TLS_DHE_DSS_WITH_AES_256_CBC_SHA, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256, TLS_DHE_RSA_WITH_AES_128_CBC_SHA256, TLS_DHE_DSS_WITH_AES_128_CBC_SHA256, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_DSS_WITH_AES_256_GCM_SHA384, TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_DSS_WITH_AES_128_GCM_SHA256]
2018-01-11 07:43:39 INFO DefaultSearchGuardKeyStore:144 - sslTransportServerProvider:JDK with ciphers [TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384, TLS_DHE_RSA_WITH_AES_256_CBC_SHA256, TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, TLS_DHE_RSA_WITH_AES_256_CBC_SHA, TLS_DHE_DSS_WITH_AES_256_CBC_SHA, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256, TLS_DHE_RSA_WITH_AES_128_CBC_SHA256, TLS_DHE_DSS_WITH_AES_128_CBC_SHA256, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_DSS_WITH_AES_256_GCM_SHA384, TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_DSS_WITH_AES_128_GCM_SHA256]
2018-01-11 07:43:39 INFO DefaultSearchGuardKeyStore:146 - sslHTTPProvider:null with ciphers []
2018-01-11 07:43:39 INFO DefaultSearchGuardKeyStore:148 - sslTransport protocols [TLSv1.2, TLSv1.1]
2018-01-11 07:43:39 INFO DefaultSearchGuardKeyStore:149 - sslHTTP protocols [TLSv1.2, TLSv1.1]
2018-01-11 07:43:39 INFO transport:99 - [Conquer Lord] Using [com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] as transport, overridden by [search-guard-ssl]
Contacting elasticsearch cluster 'elasticsearch' and wait for YELLOW clusterstate ...
ERR: Timed out while waiting for a green or yellow cluster state.
* Try running sgadmin.sh with -icl and -nhnv (If thats works you need to check your clustername as well as hostnames in your SSL certificates)
... also goes on forever with this ...
btw. what type of PV have you provided for the PVC?
@Neikius Could you try to run Indices Shards Stores API please and share the output?
In your case it could be something like:
curl -s -XGET \
https://localhost:9200/_shard_stores \
--cacert ./path_to_cert/admin-ca \
--cert ./admin-cert \
--key ./admin-key \
--insecure
@Neikius Also can you verify shard allocation is not disabled?
Try get settings for all indices (given you probably have only a single index now):
curl -s -XGET 'http://localhost:9200/_all/_settings' \
[other curl options]
Or you can get just allocation related settings:
curl -s -XGET 'http://localhost:9200/_all/_settings/name=index.routing.allocation*' \
[other curl options]
Alas I haven't yet delved into searchguard stuff, kinda expected it to work... Thank you both for helping.
@wozniakjan: glusterfs, 3 nodes, 3x 200gb bricks some info: (need more?)
# oc get storageclass glusterfs-storage
NAME TYPE
glusterfs-storage (default) kubernetes.io/glusterfs
# oc describe storageclass glusterfs-storage
Name: glusterfs-storage
IsDefaultClass: Yes
Annotations: storageclass.kubernetes.io/is-default-class=true
Provisioner: kubernetes.io/glusterfs
Parameters: x
Events: <none>
@lukas-vlcek: Yeah, this is the default config for logging from ansible script, I think.
/_shard_stores:
{"indices":{".searchguard.logging-es-data-master-rx1j1syi":{"shards":{"0":{"stores":[]}}}}}
/_all/_settings
{".searchguard.logging-es-data-master-rx1j1syi":{"settings":{"index":{"creation_date":"1515438064123","number_of_shards":"1","number_of_replicas":"0","uuid":"vj4SbQKMSv24hCOT5WQ75g","version":{"created":"2040499"}}}}}
/_all/_settings/name=index.routing.allocation*
{}
So this means what exactly?
I think we found the culprit, ES does not support gluster, please read: https://docs.openshift.org/latest/install_config/aggregate_logging.html#aggregated-elasticsearch
AFAIK we recommend only hostpath for ES PV
Ugh, that was a downer. I actually think I've read this part at some point and it just didn't register with my mind. Sorry for that and thank you.
@wozniakjan May I ask why these docs https://docs.openshift.org/latest/install_config/install/advanced_install.html#advanced-install-cluster-logging contradict the other docs (that you linked)? One claims NFS is not supported while the other provides a direct example of using NFS and dynamic storage? This confused me totally and I am still unable to install logging at all. I guess I am just dense enough not to get what is required of me. Also it seems there is a very narrow subset of edge conditions that work. For example docs claim, not specifying storage will create a deployment without storage, but it does not create anything...
@Neikius It's a documentation bug: https://github.com/openshift/openshift-docs/issues/6080
is it possible to use gluster-block for elasticsearch ?
Elasticsearch pod health state is red and stays red, even though the deployment status of pods is OK.
Version
oc version oc v3.7.0+7ed6862 kubernetes v1.7.6+a08f5eeb62 features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://cloud.example.com:8443 openshift v3.7.0+7ed6862 kubernetes v1.7.6+a08f5eeb62
openshift version openshift v3.7.0+7ed6862 kubernetes v1.7.6+a08f5eeb62 etcd 3.2.8
using ansible scripts release-3.7 branch
Steps To Reproduce
Current Result
All pods are ready state, but Kibana shows no connectin with ES. Upon inspection of elasticsearch container logs:
The last ERROR message repeats ad-infinitum.
Not sure if this tells anything:
I have no clue on how to pass parameters to sgadmin.sh to drop the index (as suggested here on a possibly unrelated issue, but it might be worth a try: https://github.com/floragunncom/search-guard/issues/282).