openiomon / hds2graphite

A utility to query metrics from Hitachi Vantara block storage and transfer them to graphite backend
http://www.openiomon.org
GNU General Public License v3.0
7 stars 0 forks source link

usage question #4

Closed vcfff closed 1 year ago

vcfff commented 4 years ago
hi, a bit moved so far and have few questions

in config file:
/opt/hds2graphite/conf/hds2graphite.conf

edited only for HTNM connection setting:
Specify the connection to your Tuning Manager / Infrastructure Analytics Advisor
# Specify the parameters for HTNM or HIAA access.
# realtime_application can be HTNM or HIAA
[realtime]
realtime_application = HTNM
realtime_api_host = 10.10.10.10
realtime_api_port = 22015
realtime_api_proto = http
realtime_api_user = user
realtime_api_passwd = password

note:
instead of fqdn used IP
instead of https used http
>> will this work even under specified http ?

i tried this one just only for HTNM

user@rhel ~]$ sudo /opt/hds2graphite/bin/hds2graphite.pl -register HTNM -realtime
Storagesystem HTNM cannot be found in the realtime configuration file /opt/hds2graphite/conf/hds2graphite-realtime.conf ! Please check storagename or configuration file!

-->>  this file: /opt/hds2graphite/conf/hds2graphite-realtime.conf  is not exist on system..  it should exist ? if yes what should be there then pls

this one not worked: user@rhel ~]$ sudo /opt/hds2graphite/bin/hds2graphite.pl -register HTNM -realtime user@rhel ~]$ sudo /opt/hds2graphite/bin/hds2graphite.pl -register 10.10.10.10 -realtime

this worked: user@rhel ~]]$ sudo /opt/hds2graphite/bin/hds2graphite.pl -register ALL -realtime Registering realtime service for VSP1 (Type: g1500 / S/N: 12345) Servicefile: /usr/lib/systemd/system/hds2graphite-rt-VSP1.service has been created! Registering realtime service for VSP2 (Type: g1500 / S/N: 98765) Servicefile: /usr/lib/systemd/system/hds2graphite-rt-VSP2.service has been created! Registering realtime service for VSP3 (Type: g600 / S/N: 456123) Servicefile: /usr/lib/systemd/system/hds2graphite-rt-VSP3.service has been created! Reloading systemctl daemon... Reload was done successful!

user@rhel ~]$ sudo /opt/hds2graphite/bin/hds2graphite.pl -register ALL -realtime Registering realtime service for VSP1 (Type: g1500 / S/N: 12345) Servicefile: /usr/lib/systemd/system/hds2graphite-rt-VSP1.service has been created! Registering realtime service for VSP2 (Type: g1500 / S/N: 98765) Servicefile: /usr/lib/systemd/system/hds2graphite-rt-VSP2.service has been created! Registering realtime service for VSP3 (Type: g600 / S/N: 456123) Servicefile: /usr/lib/systemd/system/hds2graphite-rt-VSP3.service has been created! Reloading systemctl daemon... Reload was done successful!

but how to only register HTNM then and feed data only from HTNM then..

thank you

munokar commented 4 years ago

Hello,

also if you want to only use HTNM or HIAA you need to specify a storage system. [12345] subsystem_type = g1500 subsystem_name = VSP1 svp_ip = 10.1.1.10 cci_instance_number = 10 metric_configruation = /opt/hds2graphite/conf/metrics/g1500_metrics.conf exporttool_template = /opt/hds2graphite/conf/templates/g1500_template.txt f virtual devices with GAD should monitored on top of the phyical devices specify the VSM for virtual serial number 99999 and VSM name GAD_VSM_1 and the GAD Resource-Group-ID gad_vsm = serial, type, vsm name, resource group id gad_vsm = 99999,g1500,GAD_VSM_1,1

If you don't want to use exporttool at all to gether metrics you can omit svp_ip, cci_instance_number, exporttool_template and gad_vsm. The serial numbers in the configuration should be correct and it should contain only existing array. You can register also arrays that are not present but then the service will die when trying to run...

Regards Muno

vcfff commented 4 years ago

hi, okay i did so, in logs i see

2020/08/04 23:50:26 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_PLS 2020/08/04 23:50:36 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 14724 for unit RAID_PI_PRCS is not running! It ended with returncode 111 2020/08/04 23:50:36 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_PRCS 2020/08/04 23:50:46 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 14799 for unit RAID_PI_PTS is not running! It ended with returncode 111 2020/08/04 23:50:46 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_PTS 2020/08/04 23:50:56 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 14813 for unit RAID_PI_RGS is not running! It ended with returncode 111 2020/08/04 23:50:56 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_RGS

in conf file is then below only for SN with HTNM settings, but after register, enable , start of agent its starting to throw errors above, that SN is registered in HTNM

[25554] subsystem_type = g1000 subsystem_name = HDS-B25

munokar commented 4 years ago

Hi, looks like the service is not able to retrieve the data from HTNM. First you should change the log level to DEBUG. This should give you some more detailed messages what might be going wrong. You should see in the log that the general login to HTNM via RestAPI is working. You should also see a list of all Agents registered in your HTNM installation in the log.

Be aware that the log now contains some curl commands that can be used for troubleshooting.

Regards Muno

vcfff commented 4 years ago

hi did some checks, it looks like its quering url with "&" instead of "%26", i am using HTNM 8.7.1-00, so that affecting all other queries assuming

curl -ks -X GET -H "Content-Type: application/json" -u system:manager -i http://10.10.10.10:22015/TuningManager/v1/objects/RAID_PD_LDC?hostName=HTNM&agentInstanceName=HDS-BB1
2020/08/05 14:00:23 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 25288 for unit RAID_PI_PRCS is not running! It ended with returncode 111
2020/08/05 14:00:23 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_PRCS
2020/08/05 14:00:33 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 25299 for unit RAID_PI_PTS is not running! It ended with returncode 111
2020/08/05 14:00:33 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_PTS
2020/08/05 14:00:43 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 25308 for unit RAID_PI_RGS is not running! It ended with returncode 111
2020/08/05 14:00:43 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_RGS
2020/08/05 14:00:54 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 25345 for unit RAID_PI is not running! It ended with returncode 111
2020/08/05 14:00:54 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI
2020/08/05 14:01:04 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 25362 for unit RAID_PI_CHS is not running! It ended with returncode 111
2020/08/05 14:01:04 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_CHS
2020/08/05 14:01:14 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 25382 for unit RAID_PI_LDA is not running! It ended with returncode 111
2020/08/05 14:01:14 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_LDA

curl -ks -X GET -H "Content-Type: application/json" -u system:manager -i http://10.10.10.10:22015/TuningManager/v1/objects/RAID_PD_LDC?hostName=HTNM%26agentInstanceName=HDS-BB1
"LDC",2020-08-05 11:01:16,7200,2020-08-05 11:01:16,2020-08-05 11:01:16,"00:FE:B1",3603,"13-6",6,"RAID6(6D+2P)","OPEN-V",12345,"0","Internal","","",0,"","","","","POOL","11","MPB0","","","",0
"LDC",2020-08-05 11:01:16,7200,2020-08-05 11:01:16,2020-08-05 11:01:16,"00:FE:B2",3603,"13-7",6,"RAID6(6D+2P)","OPEN-V",12345,"0","Internal","","",0,"","","","","POOL","11","MPB0","","","",0
"LDC",2020-08-05 11:01:16,7200,2020-08-05 11:01:16,2020-08-05 11:01:16,"00:FE:B3",3603,"13-7",6,"RAID6(6D+2P)","OPEN-V",12345,"0","Internal","","",0,"","","","","POOL","11","MPB0","","","",0

but since i replaced in script that value, it still not collecting anything as below, but if i will run just command in terminal console it populating data.

2020/08/05 14:48:48 [DEBUG] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:302) main::http_get > curl -ks -X GET -H "Content-Type: application/json" -u system:manager -i http://10.10.10.10:22015/TuningManager/v1/objects/RAID_PD_LDC?hostName=HTNM%26agentInstanceName=HDS-BB1
2020/08/05 14:48:57 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 8968 for unit RAID_PI_PRCS is not running! It ended with returncode 111
2020/08/05 14:48:57 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_PRCS
2020/08/05 14:49:07 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 8967 for unit RAID_PI_PTS is not running! It ended with returncode 111
2020/08/05 14:49:07 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_PTS
munokar commented 4 years ago

Hi,

the "&" and "%26" ist not the issue... The curl is just for debug. The script won't use curl at all. You can also you the curl commands with "&" when adding singleticks (') before and after the URL.

Can you please query some performance metrics? LDC = Device Configuration. Would be nice to execute:

curl -ks -X GET -H "Content-Type: application/json" -u system:manager -i 'http://10.10.10.10:22015/TuningManager/v1/objects/RAID_PI_PRCS?hostName=HTNM&agentInstanceName=HDS-BB1'

So we can verify that actual performance data is delivered.

Perhaps you can also delete stop the service, delete the logfile, restart the service and attach the log file here if this is OK for you...

Regards Muno

vcfff commented 4 years ago

did it from scratch + logs attached hds2graphite.zip

[user@rhel75 opt]$ curl -ks -X GET -H "Content-Type: application/json" -u system:manager -i 'http://10.10.10.10:22015/TuningManager/v1/objects/RAID_PI_PRCS?hostName=HTNM&agentInstanceName=HDS-BB3'
HTTP/1.1 200 OK
Date: Wed, 05 Aug 2020 12:47:53 GMT
Server: Cosminexus HTTP Server
X-Frame-Options: SAMEORIGIN
Cache-Control: no-store, no-cache
X-Content-Type-Options: nosniff
Last-Modified: Wed, 05 Aug 2020 12:47:53 GMT
X-XSS-Protection: 1; mode=block
Transfer-Encoding: chunked
Content-Type: text/csv;charset=utf-8

INPUT_RECORD_TYPE,DATETIME,GMT_ADJUST,RECORD_TIME,ADAPTOR_ID,PROCESSOR_ID,INTERVAL,CONTROLLER,PROCESSOR_TYPE,PROCESSOR_BUSY_RATE,MAX_PROCESSOR_BUSY_RATE,MAX_BUFFER_LENGTH,BUFFER_IO_COUNT,MAX_BUFFER_IO_COUNT,BUFFER_IO_RATE,MAX_BUFFER_IO_RATE
string(8),time_t,long,time_t,string(16),string(16),ulong,string(8),string(8),float,float,float,float,float,float,float
"PRCS",2020-08-05 12:47:00,7200,2020-08-05 12:47:01,"MPB0","00",60,"","MP",1.405617E+01,1.405617E+01,6.553500E+04,1.000000E+01,1.000000E+01,1.525902E-02,1.525902E-02
"PRCS",2020-08-05 12:47:00,7200,2020-08-05 12:47:01,"MPB0","01",60,"","MP",1.421692E+01,1.421692E+01,6.553500E+04,1.000000E+01,1.000000E+01,1.525902E-02,1.525902E-02
"PRCS",2020-08-05 12:47:00,7200,2020-08-05 12:47:01,"MPB0","02",60,"","MP",1.427556E+01,1.427556E+01,6.553500E+04,9.000000E+00,9.000000E+00,1.373312E-02,1.373312E-02
"PRCS",2020-08-05 12:47:00,7200,2020-08-05 12:47:01,"MPB0","03",60,"","MP",1.402110E+01,1.402110E+01,6.553500E+04,1.000000E+01,1.000000E+01,1.525902E-02,1.525902E-02
"PRCS",2020-08-05 12:47:00,7200,2020-08-05 12:47:01,"MPB0","04",60,"","MP",1.385232E+01,1.385232E+01,6.553500E+04,9.000000E+00,9.000000E+00,1.373312E-02,1.373312E-02
munokar commented 4 years ago

OK... Interesting... So before we add some debug output to identify what's going wrong can you please try the following:

  1. Stop all services with hds2graphite -stop ALL -realtime
  2. Run the data collection manually with perl -i /opt/hds2graphite/lib/perl5/ /opt/hds2graphite/bin/hds2graphite-realtime.pl -conf /opt/hds2graphite/conf/hds2graphite.conf -storagesystem HDS-BB3

It would be interesting so know if there is anything dumped to the console while is is running?

Regards Muno

vcfff commented 4 years ago

hi, nothing is dumped into console

[user@rhel75 ~]$ sudo hds2graphite -stop ALL -realtime Trying to stop realtime service for storagesystem HDS-BB3... Realtime service stoped for storagesystem HDS-BB3! [user@rhel75 ~]$ perl -i /opt/hds2graphite/lib/perl5/ /opt/hds2graphite/bin/hds2graphite-realtime.pl -conf /opt/hds2graphite/conf/hds2graphite.conf -storagesystem HDS-BB3 [user@rhel75 ~]$

but this if i will do this, is not related with it somehow probably perl /opt/hds2graphite/bin/hds2graphite-realtime.pl Can't locate Systemd/Daemon.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /opt/hds2graphite/bin/hds2graphite-realtime.pl line 21. BEGIN failed--compilation aborted at /opt/hds2graphite/bin/hds2graphite-realtime.pl line 21.

munokar commented 4 years ago

Hi,

the last message is normal because the systemd perl modules it not in the default path that why we specify the path with perl -i <path to module> in the command. I wonder if the command I gave you came back by its own? Or did you press CTRL-C after some time? If it "stops" by itself can you please issue a echo $? directly after the exit of the command to show the return / exit code? It would also be nice if you could run the command out of a root-shell to be sure that this is not a permission issue.

Regards Muno

vcfff commented 4 years ago

[user@rhel75 ~]$ perl -i /opt/hds2graphite/lib/perl5/ /opt/hds2graphite/bin/hds2graphite-realtime.pl -conf /opt/hds2graphite/conf/hds2graphite.conf -storagesystem HDS-BB3; echo $? 0 -- i m not pressing CTRCL-C, only executing command which resulting into above

munokar commented 4 years ago

If you are starting it manually is it writing to the log-file? Does it come back to shell immediately or after some seconds?

vcfff commented 4 years ago
[user@rhel75 ~]$ time perl -i /opt/hds2graphite/lib/perl5/ /opt/hds2graphite/bin/hds2graphite-realtime.pl -conf /opt/hds2graphite/conf/hds2graphite.conf -storagesystem HDS-BB3; echo $?

real    0m0.004s
user    0m0.001s
sys     0m0.003s
0
---

its immediately back in shell, nothing happening even some logs/outputs for given HDS-BB3 instance are not created/done, only one main about starting/stoping daemons

munokar commented 4 years ago

Hm, thats strange... The main process should end so quickly. As which user are you logged on to the system? Have you tried running the script with root privileges - shouldn't be necessary but might help when we have permission problems...

Since it is at least doing something when running as service: Can you provide the output of systemctl status hds2graphite-rt-*and journalctl -a -u hds2graphite

Cheers Muno

vcfff commented 4 years ago

now under root/before under user

[root@rhel7]# systemctl status hds2graphite-rt-* ● hds2graphite-rt-HDS-BB3.service - HDS2GRAPHITE Realtime Service for HDS-BB3 (Type: g1000 / S/N: 59846) Loaded: loaded (/usr/lib/systemd/system/hds2graphite-rt-HDS-BB3.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2020-08-07 12:57:16 EEST; 33s ago Docs: http://www.openiomon.org Main PID: 892 (hds2graphite-re) Status: "Running reporters..." Tasks: 8 Memory: 14.8M CGroup: /system.slice/hds2graphite-rt-HDS-BB3.service └─892 /bin/perl /opt/hds2graphite/bin/hds2graphite-realtime.pl -conf /opt/hds2graphite/conf/hds2graphite.conf -storagesystem HDS-BB3

Aug 07 12:57:16 rhel7 systemd[1]: Starting HDS2GRAPHITE Realtime Service for HDS-BB3 (Type: g1000 / S/N: 59846)... Aug 07 12:57:16 rhel7 systemd[1]: Started HDS2GRAPHITE Realtime Service for HDS-BB3(Type: g1000 / S/N: 59846). [root@rhel7]# journalctl -a -u hds2graphite -- No entries --

munokar commented 4 years ago

In your screenshot the service is only running for for 33s. If you check agains is it showing a longer time? I had a mistake in the command for journalctl. The command need to be: journalctl -a -u hds2graphite* Would be nice to see the output of that command as root.

vcfff commented 4 years ago

logs.zip logs attached, it ran for few hours but weird is "3min 38s ago" looks like restarting/aborting from some reason Active: active (running) since Wed 2020-08-12 02:39:27 EEST; 3min 38s ago

mamoep commented 1 year ago

I guess the issue is solved by now. New release v0.4.0 should write to log file why the process died.