noony / prometheus-solr-exporter

Solr exporter for prometheus.
Apache License 2.0
27 stars 18 forks source link

Solr Exporter error: Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:55248->127.0.0.1:8983: i/o timeout source="exporter.go:292" #3

Closed m4rcc closed 6 years ago

m4rcc commented 7 years ago

Solr version: 6.4.1 Solr exporter: root@# ./prometheus-solr-exporter -version solr_exporter, version (branch: , revision: ) build user: build date: go version: go1.8

Hello again, I have installed this exporter, it seems to work, i also installed the Grafana dashboard which works ok but i have two issues:

  1. The log file of prometheus-solr-exporter is full of the following errors:

ERRO[761358] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42567->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[761393] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42580->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[761473] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42608->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[761488] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42613->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[761533] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42628->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[761563] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42638->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[761618] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42657->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[761663] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42673->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[761758] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42707->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[761783] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42715->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[761798] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42719->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[761823] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42727->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[761958] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42774->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[761988] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42783->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[762033] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42800->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[762118] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42830->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[762248] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42875->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[762278] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42885->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[762313] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42898->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[762343] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42908->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[762443] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42943->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[762468] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42951->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[762538] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:42977->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[762648] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:43014->127.0.0.1:8983: i/o timeout source="exporter.go:292" ERRO[762708] Failed to unmarshal mbeansdata JSON into struct: read tcp 127.0.0.1:43036->127.0.0.1:8983: i/o timeout source="exporter.go:292"

  1. I created in Prometheus the following alert:

ALERT SolrServiceDown IF solr_up != 1 FOR 15s LABELS { severity = "CRITICAL", env = "{{ cloud_env }}", alert_name = "SolrServiceDown" } ANNOTATIONS { summary = "SolrServiceDown", description = "SolrServiceDown for {{ '{{' }} $labels.instance {{ '}}' }}", }

Which ( i assume in the same time the error appears in the log) is "flip flopping" from solr_up = 1 to solr_up = 0. The alert is not triggered because of the timeout of 15s but is activated...deactivated in a continuos manner.

I tried to set the log level to INFO and i got the same error messages. I also tried with a timeout of 10s and i got the same results.

root@#:/opt/solr_exporter# cat /etc/init/solr_exporter.conf

Prometheus Solr Exporter Upstart script

start on startup script /usr/bin/solr_exporter -log.level info -solr.timeout 10s end script

noony commented 7 years ago

I'll looking this issue this weekend. For the exporter i will check. Thanks for report.

noony commented 7 years ago

but to avoid flip flopping FOR 15s it's too short if in prometheus you have a scrape_interval shorter or equal.

I recommend 1m or 2m. What's your scape interval in prometheus configuration ?

m4rcc commented 7 years ago

Hi ,

The "FOR 15s" is just my alert definition. How is that related to Prometheus scrape interval? That just triggers an alert if solr_up != 1 for more than 15 seconds. The flip flopping is because solr_up = 0 for a short period, while Solr service is up in this time.

I will look when i will arrive at a computer what is exactly the Prometheus scrape interval ( Consul more exactly, afaik is 15s ) and test with an increased scrape interval

On Fri, 4 Aug 2017 at 15:41, Thomas Colomb notifications@github.com wrote:

but to avoid flip flopping FOR 15s it's too short if in prometheus you have a scrape_interval shorter or equal.

I recommend 1m or 2m. What's your scape interval in prometheus configuration ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/noony/prometheus-solr-exporter/issues/3#issuecomment-320238596, or mute the thread https://github.com/notifications/unsubscribe-auth/AYVFP47gooSeyqVdMbwrIE-8m38FZRWxks5sUxFjgaJpZM4OrU2p .

m4rcc commented 7 years ago

Hi, The Prometheus scrape interval was 15s with a 10s timeout ( through Consul):

"Funny" thing, i wanted to test what you suggested but on our "test" env, one of my colleagues reverted from Solr version: 6.4.1 to Solr version: 6.1.0 and now i have the old problem (which is closed now):

"Error while running this exporter

2 by m4rcc was closed 18 days ago "



I am wondering  (pretty sure) if the fix really worked or i just did not checked that somebody updated the Solr version to 6.4.1, and worked with the newer version... :(
noony commented 7 years ago

@m4rcc can you test with the v0.0.5 please ?

m4rcc commented 7 years ago

I will test now, at first glance the redirect does not work:

Hostname:9231 shows the "metrics" link but it does not redirects you to hostname:9231/metrics .

On Tue, 3 Oct 2017 at 16:41, Thomas Colomb notifications@github.com wrote:

@m4rcc https://github.com/m4rcc can you test with the v0.0.5 please ?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/noony/prometheus-solr-exporter/issues/3#issuecomment-333845535, or mute the thread https://github.com/notifications/unsubscribe-auth/AYVFPwtYAFAtoIch0MYWzS4gGnjSF6xnks5sojmYgaJpZM4OrU2p .

m4rcc commented 7 years ago

Unfortunately the Solr specific metrics are not working. I tested on:

sudo service solr status Found 1 Solr nodes: Solr process 11278 running on port 8983 { "solr_home":"/var/solr/data", "version":"6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:49:06", "startTime":"2017-09-28T18:47:03.324Z", "uptime":"4 days, 19 hours, 10 minutes, 17 seconds", "memory":"3.9 GB (%48.8) of 8 GB", "cloud":{ "ZooKeeper":"10.96.214.119:2181,10.97.210.168:2181,10.96.214.104:2181", "liveNodes":"2", "collections":"3"}}

The log is full with this error:

ERRO[0520] Failed to unmarshal mbeans cache metrics JSON into struct: json: cannot unmarshal number into Go struct field .hitratio of type string source="exporter.go:388" ERRO[0520] Failed to unmarshal mbeans cache metrics JSON into struct: json: cannot unmarshal number into Go struct field .hitratio of type string source="exporter.go:388" ERRO[0525] Failed to unmarshal mbeans cache metrics JSON into struct: json: cannot unmarshal number into Go struct field .hitratio of type string source="exporter.go:388" ERRO[0525] Failed to unmarshal mbeans cache metrics JSON into struct: json: cannot unmarshal number into Go struct field .hitratio of type string source="exporter.go:388" ERRO[0527] Failed to unmarshal mbeans cache metrics JSON into struct: json: cannot unmarshal number into Go struct field .hitratio of type string source="exporter.go:388" ERRO[0530] Failed to unmarshal mbeans cache metrics JSON into struct: json: cannot unmarshal number into Go struct field .hitratio of type string source="exporter.go:388" ERRO[0530] Failed to unmarshal mbeans cache metrics JSON into struct: json: cannot unmarshal number into Go struct field .hitratio of type string source="exporter.go:388"

And these are all the exported metrics:

# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 6.9002e-05
go_gc_duration_seconds{quantile="0.25"} 0.000104332
go_gc_duration_seconds{quantile="0.5"} 0.00015534
go_gc_duration_seconds{quantile="0.75"} 0.000278982
go_gc_duration_seconds{quantile="1"} 0.002875248
go_gc_duration_seconds_sum 0.047243272
go_gc_duration_seconds_count 130
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 14
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.8"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 2.142352e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 3.10457464e+08
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.481205e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 2.726926e+06
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 0.0002673829487045407
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 485376
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 2.142352e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 2.342912e+06
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 3.260416e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 14456
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 0
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 5.603328e+06
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.5070390899863791e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 2269
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 2.741382e+06
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 4800
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 47880
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 65536
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 4.194304e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.169155e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 688128
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 688128
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 9.509112e+06
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 12
# HELP http_request_duration_microseconds The HTTP request latencies in microseconds.
# TYPE http_request_duration_microseconds summary
http_request_duration_microseconds{handler="prometheus",quantile="0.5"} 31948.747
http_request_duration_microseconds{handler="prometheus",quantile="0.9"} 42397.379
http_request_duration_microseconds{handler="prometheus",quantile="0.99"} 57592.393
http_request_duration_microseconds_sum{handler="prometheus"} 6.583149141999999e+06
http_request_duration_microseconds_count{handler="prometheus"} 203
# HELP http_request_size_bytes The HTTP request sizes in bytes.
# TYPE http_request_size_bytes summary
http_request_size_bytes{handler="prometheus",quantile="0.5"} 283
http_request_size_bytes{handler="prometheus",quantile="0.9"} 283
http_request_size_bytes{handler="prometheus",quantile="0.99"} 313
http_request_size_bytes_sum{handler="prometheus"} 51537
http_request_size_bytes_count{handler="prometheus"} 203
# HELP http_requests_total Total number of HTTP requests made.
# TYPE http_requests_total counter
http_requests_total{code="200",handler="prometheus",method="get"} 203
# HELP http_response_size_bytes The HTTP response sizes in bytes.
# TYPE http_response_size_bytes summary
http_response_size_bytes{handler="prometheus",quantile="0.5"} 1684
http_response_size_bytes{handler="prometheus",quantile="0.9"} 7369
http_response_size_bytes{handler="prometheus",quantile="0.99"} 7392
http_response_size_bytes_sum{handler="prometheus"} 500764
http_response_size_bytes_count{handler="prometheus"} 203
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 3.68
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 8
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.3443072e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.50703866114e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 3.09100544e+08
# HELP solr_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which solr_exporter was built.
# TYPE solr_exporter_build_info gauge
solr_exporter_build_info{branch="",goversion="go1.8",revision="",version=""} 1
# HELP solr_up Was the Solr instance query successful?
# TYPE solr_up gauge
solr_up 0

As you see it sees the Solr service down but it is up.

noony commented 7 years ago

Hi @m4rcc , thanks for reporting, can you send the value of hitratio under CACHE or full result of this url please :

http://YOUR_SOLR_URL:8983/solr/YOUR_CORE/admin/mbeans?stats=true&wt=json&cat=CORE&cat=QUERYHANDLER&cat=UPDATEHANDLER&cat=CACHE

Thanks.

noony commented 7 years ago

thanks. I'm looking it

m4rcc commented 7 years ago

do you still need the curl or not?

noony commented 7 years ago

I'm making a release to add the corename in order to identify where is the problem.

noony commented 7 years ago

@m4rcc I pushed a new version (v0.0.6) in order to identify the corename. can you get the failing core and return me the json (not the html) of the response of url mentioned before please.

noony commented 7 years ago

Oups sorry miss click

m4rcc commented 7 years ago

INFO[0000] Starting solr_exporter (version=, branch=, revision=) source="main.go:52" INFO[0000] Build context (go=go1.8, user=, date=) source="main.go:53" INFO[0000] Listening on :9231 source="main.go:75" ERRO[0055] Failed to unmarshal mbeans cache metrics JSON into struct (core : master_XXXX_Product_flip_shard1_replica1): json: cannot unmarshal number into Go struct field .hitratio of type string source="exporter.go:388" The curl in a moment

m4rcc commented 7 years ago

solr_dump.txt

noony commented 7 years ago

why it's a xml and not json ?

m4rcc commented 7 years ago

If i put the link in the browser it gets formated as json but when using curl what i have attached is what i get... The same for :

curl http://10.96.214.95:8983/solr/admin/collections?action=CLUSTERSTATUS curl http://10.96.214.95:8983/solr/admin/collections?action=CLUSTERSTATUS&wt=json >> curl xml, browser json

noony commented 7 years ago

Can you try with v0.0.7 please.

m4rcc commented 7 years ago

INFO[0000] Starting solr_exporter (version=, branch=, revision=) source="main.go:53" INFO[0000] Build context (go=go1.8, user=, date=) source="main.go:54" INFO[0000] Listening on :9231 source="main.go:76" ERRO[0038] Failed to unmarshal mbeans cache metrics JSON into struct (core : master_XXXX_Product_flip_shard1_replica1): json: cannot unmarshal number into Go struct field .hitratio of type string source="exporter.go:389" ERRO[0043] Failed to unmarshal mbeans cache metrics JSON into struct (core : master_XXXX_Product_flip_shard1_replica1): json: cannot unmarshal number into Go struct field .hitratio of type string source="exporter.go:389" ERRO[0049] Failed to unmarshal mbeans cache metrics JSON into struct (core : master_XXXX_Product_flip_shard1_replica1): json: cannot unmarshal number into Go struct field .hitratio of type string source="exporter.go:389"

noony commented 7 years ago

I've made a 0.0.8 version to get the gulty json directly in error log.

m4rcc commented 7 years ago

error_json_solr_exporter.txt

noony commented 7 years ago

I pushed a fix in v0.0.9 @m4rcc

m4rcc commented 7 years ago

I have compiled the new version and had the same size as the previous one 11691819 , coincidence? I checked the solr_exporter version but it does not work:

solr_exporter --version

solr_exporter, version (branch: , revision: ) build user: build date: go version: go1.8 Regarding the metrics, still does not work. 20171020_solr_exporter_error.txt

noony commented 7 years ago

Hi @m4rcc I pushed a new version v0.0.10 . I see now what's the problem with your error log thanks.

m4rcc commented 7 years ago

No problem, i tested now and i can confirm that it works. When i will be at work i will test further including the Grafana dashboard also.

noony commented 7 years ago

Thanks, sorry that it was so long to fix. It wasn't a simple problem.

noony commented 6 years ago

Hey @m4rcc , did you test the new version at your work ? Can I close this issue ?

m4rcc commented 6 years ago

Hey @noony, you can close it, i have tested on the 6.4 and 6.1 Solr version and works fine now.

noony commented 6 years ago

Thanks @m4rcc for your help !

kentnek commented 10 months ago

Hi @noony, we're getting the error Failed to unmarshal mbeansdata JSON into struct: read tcp 10.0.1.22:36756->10.0.1.125:8983: i/o timeout with solr 8. We can curl the path /admin/mbeans?stats=true&wt=json&cat=CORE&cat=QUERY&cat=UPDATE&cat=CACHE without any problem.

We're also seeing:

Fail to convert Hitratio in float: strconv.ParseFloat: parsing \"\": invalid syntax

and

Fail to convert Cumulative Hitratio in float: strconv.ParseFloat: parsing \"\": invalid syntax

Could you help with this? Thanks!