microsoft / OMS-Agent-for-Linux

http://www.microsoft.com/oms
Other
410 stars 309 forks source link

On-premise custom logs are not read #740

Closed alert101 closed 5 years ago

alert101 commented 6 years ago

OS: CentOS 7.5 OMSAgent package: 1.6.0-42

We have hosts both in Azure and in an on-premise location. The on-premise hosts are behind a proxy which has been configured to the agent when it was onboarded. The hosts are running different Spring boot applications which save logs to /var/log/[app]/[app].json.log. We have configured performance metrics via the OMS portal, these show up just fine in Log Analytics, both from Azure and on-premise hosts. In addition we have configured logging for the apps via Ansible, which adds the following configuration files to the omsagent (these custom logs are NOT configured via the OMS portal):

/etc/opt/microsoft/omsagent/[workspace-id]/conf/omsagent.d/json_output_plugin.conf

<match oms.api.**>
  type out_oms_api
  log_level info
  num_threads 5
  omsadmin_conf_path /etc/opt/microsoft/omsagent/[workspace-id]/conf/omsadmin.conf
  cert_path /etc/opt/microsoft/omsagent/[workspace-id]/certs/oms.crt
  key_path /etc/opt/microsoft/omsagent/[workspace-id]/certs/oms.key
  buffer_chunk_limit 10m
  buffer_type file
  buffer_path /var/opt/microsoft/omsagent/[workspace-id]/state/out_oms_api*.buffer
  buffer_queue_limit 10
  buffer_queue_full_action drop_oldest_chunk
  flush_interval 30s
  retry_limit 10
  retry_wait 30s
  max_retry_wait 9m
</match>

/etc/opt/microsoft/omsagent/[workspace-id]/conf/omsagent.d/[app].conf

<source>
  type sudo_tail
  path /var/log/[app]/*.log
  pos_file /var/opt/microsoft/omsagent/[workspace-id]/state/[app]_CL.pos
  read_from_head true
  run_interval 10s
  # This tag matches to json_output_plugin.conf
  tag oms.api.[app]
  format json
</source>

This configuration works just fine on all Azure hosts, both application logs and metrics configured in the OMS portal appear in Log Analytics. However on the on-premise hosts the application logs are not read, metrics configured in the OMS portal appear just fine. Comparing omsagent.log I noticed that the following rows are missing on the on-premise agents:

2018-07-05 13:28:56 +0300 [info]: Following tail of /var/log/[app]/[app].json.log

Logs from on-premise hosts after restarting agent /var/opt/microsoft/omsagent/[workspace-id]/log/omsagent.log

2018-07-05 14:13:19 +0300 [info]: listening syslog socket on 127.0.0.1:25224 with udp
2018-07-05 14:18:19 +0300 [info]: Sending OMS Heartbeat succeeded at 2018-07-05T11:18:19.240Z
2018-07-05 14:23:19 +0300 [info]: Sending OMS Heartbeat succeeded at 2018-07-05T11:23:19.243Z
2018-07-05 14:28:19 +0300 [info]: Sending OMS Heartbeat succeeded at 2018-07-05T11:28:19.245Z
2018-07-05 14:33:19 +0300 [info]: Sending OMS Heartbeat succeeded at 2018-07-05T11:33:19.245Z
2018-07-05 14:33:20 +0300 [info]: OMS agent management service topology request success

/var/opt/microsoft/omsconfig/omsconfig.log

2018/07/06 09:00:01: WARNING: null(0): EventId=2 Priority=WARNING Job EDA00B7F-8424-41A3-A38D-50B3DE97A876 : Starting PerformRequiredConfigurationChecks DSC operation.
2018/07/06 09:00:32: WARNING: null(0): EventId=2 Priority=WARNING Job EDA00B7F-8424-41A3-A38D-50B3DE97A876 :
Displaying messages from built-in DSC resources:
         WMI channel 1
         ResourceID:
         Message : []:                            [] Starting consistency engine.
2018/07/06 09:00:32: WARNING: null(0): EventId=2 Priority=WARNING Job EDA00B7F-8424-41A3-A38D-50B3DE97A876 :
Displaying messages from built-in DSC resources:
         WMI channel 1
         ResourceID:
         Message : []:                            [] A pending configuration exists. DSC will process a set request on the pending configuration.
2018/07/06 09:00:32: ERROR: null(0): EventId=1 Priority=ERROR Job EDA00B7F-8424-41A3-A38D-50B3DE97A876 :
This event indicates that failure happens when LCM is processing the configuration. ErrorId is 5. ErrorDetail is The SendConfigurationApply function did not succeed.. ResourceId is [MSFT_nxFileInventoryResource]Inventory and SourceInfo is null. ErrorMessage is The specified class does not exist..
2018/07/06 09:00:32: INFO: Scripts/nxOMSPlugin.pyc(114):
OMSAgent is multi-homed and resource is updating workspace [workspace-id]
2018/07/06 09:00:32: ERROR: null(0): EventId=1 Priority=ERROR Job EDA00B7F-8424-41A3-A38D-50B3DE97A876 :
This event indicates that failure happens when LCM is processing the configuration. ErrorId is 5. ErrorDetail is The SendConfigurationApply function did not succeed.. ResourceId is [nxOMSPerfCounterResource]nodeperfcounter and SourceInfo is C:\temp\MicrosoftOperationsManagementLinuxConfiguration.ps1::20::1::nxOMSPerfCounterResource. ErrorMessage is The specified class does not exist..
2018/07/06 09:00:33: INFO: Scripts/nxOMSAuditdPlugin.pyc(298):
auoms conf does not match desired conf
2018/07/06 09:00:33: ERROR: null(0): EventId=1 Priority=ERROR Job EDA00B7F-8424-41A3-A38D-50B3DE97A876 :
DSC Engine Error :
         Error Message Failed to apply the configuration.  These resources produced errors: [MSFT_nxFileInventoryResource]Inventory, [nxOMSPerfCounterResource]nodeperfcounter
        Error Code : 5
2018/07/06 09:00:33: WARNING: null(0): EventId=2 Priority=WARNING Job EDA00B7F-8424-41A3-A38D-50B3DE97A876 :
Displaying messages from built-in DSC resources:
         WMI channel 1
         ResourceID:
         Message : []:                            [] Consistency check completed.
2018/07/06 09:00:43: WARNING: null(0): EventId=2 Priority=WARNING Job EDA00B7F-8424-41A3-A38D-50B3DE97A876 : PerformRequiredConfigurationChecks DSC operation completed in 41.4862 seconds.
shpimpal commented 6 years ago

@amal-khalaf , can you take a look ?

alert101 commented 6 years ago

Any update on this?

alert101 commented 5 years ago

Updating the agent to 1.6.0-163 seems to have fixed the problem whatever it was. Closing issue.