Closed vcfff closed 1 year ago
Hello,
also if you want to only use HTNM or HIAA you need to specify a storage system.
[12345]
subsystem_type = g1500
subsystem_name = VSP1
svp_ip = 10.1.1.10
cci_instance_number = 10
metric_configruation = /opt/hds2graphite/conf/metrics/g1500_metrics.conf
exporttool_template = /opt/hds2graphite/conf/templates/g1500_template.txt
f virtual devices with GAD should monitored on top of the phyical devices specify the VSM for virtual serial number 99999
and VSM name GAD_VSM_1 and the GAD Resource-Group-ID
gad_vsm = serial, type, vsm name, resource group id
gad_vsm = 99999,g1500,GAD_VSM_1,1
If you don't want to use exporttool at all to gether metrics you can omit svp_ip, cci_instance_number, exporttool_template and gad_vsm. The serial numbers in the configuration should be correct and it should contain only existing array. You can register also arrays that are not present but then the service will die when trying to run...
Regards Muno
hi, okay i did so, in logs i see
2020/08/04 23:50:26 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_PLS 2020/08/04 23:50:36 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 14724 for unit RAID_PI_PRCS is not running! It ended with returncode 111 2020/08/04 23:50:36 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_PRCS 2020/08/04 23:50:46 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 14799 for unit RAID_PI_PTS is not running! It ended with returncode 111 2020/08/04 23:50:46 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_PTS 2020/08/04 23:50:56 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 14813 for unit RAID_PI_RGS is not running! It ended with returncode 111 2020/08/04 23:50:56 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_RGS
in conf file is then below only for SN with HTNM settings, but after register, enable , start of agent its starting to throw errors above, that SN is registered in HTNM
[25554] subsystem_type = g1000 subsystem_name = HDS-B25
Hi, looks like the service is not able to retrieve the data from HTNM. First you should change the log level to DEBUG. This should give you some more detailed messages what might be going wrong. You should see in the log that the general login to HTNM via RestAPI is working. You should also see a list of all Agents registered in your HTNM installation in the log.
Be aware that the log now contains some curl commands that can be used for troubleshooting.
Regards Muno
hi did some checks, it looks like its quering url with "&" instead of "%26", i am using HTNM 8.7.1-00, so that affecting all other queries assuming
curl -ks -X GET -H "Content-Type: application/json" -u system:manager -i http://10.10.10.10:22015/TuningManager/v1/objects/RAID_PD_LDC?hostName=HTNM&agentInstanceName=HDS-BB1
2020/08/05 14:00:23 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 25288 for unit RAID_PI_PRCS is not running! It ended with returncode 111
2020/08/05 14:00:23 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_PRCS
2020/08/05 14:00:33 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 25299 for unit RAID_PI_PTS is not running! It ended with returncode 111
2020/08/05 14:00:33 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_PTS
2020/08/05 14:00:43 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 25308 for unit RAID_PI_RGS is not running! It ended with returncode 111
2020/08/05 14:00:43 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_RGS
2020/08/05 14:00:54 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 25345 for unit RAID_PI is not running! It ended with returncode 111
2020/08/05 14:00:54 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI
2020/08/05 14:01:04 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 25362 for unit RAID_PI_CHS is not running! It ended with returncode 111
2020/08/05 14:01:04 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_CHS
2020/08/05 14:01:14 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 25382 for unit RAID_PI_LDA is not running! It ended with returncode 111
2020/08/05 14:01:14 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_LDA
curl -ks -X GET -H "Content-Type: application/json" -u system:manager -i http://10.10.10.10:22015/TuningManager/v1/objects/RAID_PD_LDC?hostName=HTNM%26agentInstanceName=HDS-BB1
"LDC",2020-08-05 11:01:16,7200,2020-08-05 11:01:16,2020-08-05 11:01:16,"00:FE:B1",3603,"13-6",6,"RAID6(6D+2P)","OPEN-V",12345,"0","Internal","","",0,"","","","","POOL","11","MPB0","","","",0
"LDC",2020-08-05 11:01:16,7200,2020-08-05 11:01:16,2020-08-05 11:01:16,"00:FE:B2",3603,"13-7",6,"RAID6(6D+2P)","OPEN-V",12345,"0","Internal","","",0,"","","","","POOL","11","MPB0","","","",0
"LDC",2020-08-05 11:01:16,7200,2020-08-05 11:01:16,2020-08-05 11:01:16,"00:FE:B3",3603,"13-7",6,"RAID6(6D+2P)","OPEN-V",12345,"0","Internal","","",0,"","","","","POOL","11","MPB0","","","",0
but since i replaced in script that value, it still not collecting anything as below, but if i will run just command in terminal console it populating data.
2020/08/05 14:48:48 [DEBUG] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:302) main::http_get > curl -ks -X GET -H "Content-Type: application/json" -u system:manager -i http://10.10.10.10:22015/TuningManager/v1/objects/RAID_PD_LDC?hostName=HTNM%26agentInstanceName=HDS-BB1
2020/08/05 14:48:57 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 8968 for unit RAID_PI_PRCS is not running! It ended with returncode 111
2020/08/05 14:48:57 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_PRCS
2020/08/05 14:49:07 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:580) main::checkreporter > Looks like PID: 8967 for unit RAID_PI_PTS is not running! It ended with returncode 111
2020/08/05 14:49:07 [ERROR] (/opt/hds2graphite/bin/hds2graphite-realtime.pl:581) main::checkreporter > Restarting for Unit: RAID_PI_PTS
Hi,
the "&" and "%26" ist not the issue... The curl is just for debug. The script won't use curl at all. You can also you the curl commands with "&" when adding singleticks (') before and after the URL.
Can you please query some performance metrics? LDC = Device Configuration. Would be nice to execute:
curl -ks -X GET -H "Content-Type: application/json" -u system:manager -i 'http://10.10.10.10:22015/TuningManager/v1/objects/RAID_PI_PRCS?hostName=HTNM&agentInstanceName=HDS-BB1'
So we can verify that actual performance data is delivered.
Perhaps you can also delete stop the service, delete the logfile, restart the service and attach the log file here if this is OK for you...
Regards Muno
did it from scratch + logs attached hds2graphite.zip
[user@rhel75 opt]$ curl -ks -X GET -H "Content-Type: application/json" -u system:manager -i 'http://10.10.10.10:22015/TuningManager/v1/objects/RAID_PI_PRCS?hostName=HTNM&agentInstanceName=HDS-BB3'
HTTP/1.1 200 OK
Date: Wed, 05 Aug 2020 12:47:53 GMT
Server: Cosminexus HTTP Server
X-Frame-Options: SAMEORIGIN
Cache-Control: no-store, no-cache
X-Content-Type-Options: nosniff
Last-Modified: Wed, 05 Aug 2020 12:47:53 GMT
X-XSS-Protection: 1; mode=block
Transfer-Encoding: chunked
Content-Type: text/csv;charset=utf-8
INPUT_RECORD_TYPE,DATETIME,GMT_ADJUST,RECORD_TIME,ADAPTOR_ID,PROCESSOR_ID,INTERVAL,CONTROLLER,PROCESSOR_TYPE,PROCESSOR_BUSY_RATE,MAX_PROCESSOR_BUSY_RATE,MAX_BUFFER_LENGTH,BUFFER_IO_COUNT,MAX_BUFFER_IO_COUNT,BUFFER_IO_RATE,MAX_BUFFER_IO_RATE
string(8),time_t,long,time_t,string(16),string(16),ulong,string(8),string(8),float,float,float,float,float,float,float
"PRCS",2020-08-05 12:47:00,7200,2020-08-05 12:47:01,"MPB0","00",60,"","MP",1.405617E+01,1.405617E+01,6.553500E+04,1.000000E+01,1.000000E+01,1.525902E-02,1.525902E-02
"PRCS",2020-08-05 12:47:00,7200,2020-08-05 12:47:01,"MPB0","01",60,"","MP",1.421692E+01,1.421692E+01,6.553500E+04,1.000000E+01,1.000000E+01,1.525902E-02,1.525902E-02
"PRCS",2020-08-05 12:47:00,7200,2020-08-05 12:47:01,"MPB0","02",60,"","MP",1.427556E+01,1.427556E+01,6.553500E+04,9.000000E+00,9.000000E+00,1.373312E-02,1.373312E-02
"PRCS",2020-08-05 12:47:00,7200,2020-08-05 12:47:01,"MPB0","03",60,"","MP",1.402110E+01,1.402110E+01,6.553500E+04,1.000000E+01,1.000000E+01,1.525902E-02,1.525902E-02
"PRCS",2020-08-05 12:47:00,7200,2020-08-05 12:47:01,"MPB0","04",60,"","MP",1.385232E+01,1.385232E+01,6.553500E+04,9.000000E+00,9.000000E+00,1.373312E-02,1.373312E-02
OK... Interesting... So before we add some debug output to identify what's going wrong can you please try the following:
hds2graphite -stop ALL -realtime
perl -i /opt/hds2graphite/lib/perl5/ /opt/hds2graphite/bin/hds2graphite-realtime.pl -conf /opt/hds2graphite/conf/hds2graphite.conf -storagesystem HDS-BB3
It would be interesting so know if there is anything dumped to the console while is is running?
Regards Muno
hi, nothing is dumped into console
[user@rhel75 ~]$ sudo hds2graphite -stop ALL -realtime Trying to stop realtime service for storagesystem HDS-BB3... Realtime service stoped for storagesystem HDS-BB3! [user@rhel75 ~]$ perl -i /opt/hds2graphite/lib/perl5/ /opt/hds2graphite/bin/hds2graphite-realtime.pl -conf /opt/hds2graphite/conf/hds2graphite.conf -storagesystem HDS-BB3 [user@rhel75 ~]$
but this if i will do this, is not related with it somehow probably perl /opt/hds2graphite/bin/hds2graphite-realtime.pl Can't locate Systemd/Daemon.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /opt/hds2graphite/bin/hds2graphite-realtime.pl line 21. BEGIN failed--compilation aborted at /opt/hds2graphite/bin/hds2graphite-realtime.pl line 21.
Hi,
the last message is normal because the systemd perl modules it not in the default path that why we specify the path with perl -i <path to module>
in the command.
I wonder if the command I gave you came back by its own? Or did you press CTRL-C after some time?
If it "stops" by itself can you please issue a echo $?
directly after the exit of the command to show the return / exit code?
It would also be nice if you could run the command out of a root-shell to be sure that this is not a permission issue.
Regards Muno
[user@rhel75 ~]$ perl -i /opt/hds2graphite/lib/perl5/ /opt/hds2graphite/bin/hds2graphite-realtime.pl -conf /opt/hds2graphite/conf/hds2graphite.conf -storagesystem HDS-BB3; echo $? 0 -- i m not pressing CTRCL-C, only executing command which resulting into above
If you are starting it manually is it writing to the log-file? Does it come back to shell immediately or after some seconds?
[user@rhel75 ~]$ time perl -i /opt/hds2graphite/lib/perl5/ /opt/hds2graphite/bin/hds2graphite-realtime.pl -conf /opt/hds2graphite/conf/hds2graphite.conf -storagesystem HDS-BB3; echo $?
real 0m0.004s
user 0m0.001s
sys 0m0.003s
0
---
its immediately back in shell, nothing happening even some logs/outputs for given HDS-BB3 instance are not created/done, only one main about starting/stoping daemons
Hm, thats strange... The main process should end so quickly. As which user are you logged on to the system? Have you tried running the script with root privileges - shouldn't be necessary but might help when we have permission problems...
Since it is at least doing something when running as service:
Can you provide the output of systemctl status hds2graphite-rt-*
and journalctl -a -u hds2graphite
Cheers Muno
now under root/before under user
[root@rhel7]# systemctl status hds2graphite-rt-* ● hds2graphite-rt-HDS-BB3.service - HDS2GRAPHITE Realtime Service for HDS-BB3 (Type: g1000 / S/N: 59846) Loaded: loaded (/usr/lib/systemd/system/hds2graphite-rt-HDS-BB3.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2020-08-07 12:57:16 EEST; 33s ago Docs: http://www.openiomon.org Main PID: 892 (hds2graphite-re) Status: "Running reporters..." Tasks: 8 Memory: 14.8M CGroup: /system.slice/hds2graphite-rt-HDS-BB3.service └─892 /bin/perl /opt/hds2graphite/bin/hds2graphite-realtime.pl -conf /opt/hds2graphite/conf/hds2graphite.conf -storagesystem HDS-BB3
Aug 07 12:57:16 rhel7 systemd[1]: Starting HDS2GRAPHITE Realtime Service for HDS-BB3 (Type: g1000 / S/N: 59846)... Aug 07 12:57:16 rhel7 systemd[1]: Started HDS2GRAPHITE Realtime Service for HDS-BB3(Type: g1000 / S/N: 59846). [root@rhel7]# journalctl -a -u hds2graphite -- No entries --
In your screenshot the service is only running for for 33s.
If you check agains is it showing a longer time?
I had a mistake in the command for journalctl. The command need to be: journalctl -a -u hds2graphite*
Would be nice to see the output of that command as root.
logs.zip logs attached, it ran for few hours but weird is "3min 38s ago" looks like restarting/aborting from some reason Active: active (running) since Wed 2020-08-12 02:39:27 EEST; 3min 38s ago
I guess the issue is solved by now. New release v0.4.0 should write to log file why the process died.
this one not worked: user@rhel ~]$ sudo /opt/hds2graphite/bin/hds2graphite.pl -register HTNM -realtime user@rhel ~]$ sudo /opt/hds2graphite/bin/hds2graphite.pl -register 10.10.10.10 -realtime
this worked: user@rhel ~]]$ sudo /opt/hds2graphite/bin/hds2graphite.pl -register ALL -realtime Registering realtime service for VSP1 (Type: g1500 / S/N: 12345) Servicefile: /usr/lib/systemd/system/hds2graphite-rt-VSP1.service has been created! Registering realtime service for VSP2 (Type: g1500 / S/N: 98765) Servicefile: /usr/lib/systemd/system/hds2graphite-rt-VSP2.service has been created! Registering realtime service for VSP3 (Type: g600 / S/N: 456123) Servicefile: /usr/lib/systemd/system/hds2graphite-rt-VSP3.service has been created! Reloading systemctl daemon... Reload was done successful!
user@rhel ~]$ sudo /opt/hds2graphite/bin/hds2graphite.pl -register ALL -realtime Registering realtime service for VSP1 (Type: g1500 / S/N: 12345) Servicefile: /usr/lib/systemd/system/hds2graphite-rt-VSP1.service has been created! Registering realtime service for VSP2 (Type: g1500 / S/N: 98765) Servicefile: /usr/lib/systemd/system/hds2graphite-rt-VSP2.service has been created! Registering realtime service for VSP3 (Type: g600 / S/N: 456123) Servicefile: /usr/lib/systemd/system/hds2graphite-rt-VSP3.service has been created! Reloading systemctl daemon... Reload was done successful!
but how to only register HTNM then and feed data only from HTNM then..
thank you