microsoft / omi-script-provider

Script Provider for the OMI Project
Other
8 stars 7 forks source link

[Linux->Local] Enumerate multiple times will fail #44

Closed JumpingYang001 closed 7 years ago

JumpingYang001 commented 7 years ago

Currently we just check CentOS7x64 has this issue, will check more OS and add more information for this tomorrow. Build: omi path: \\redmond\wsscfs\OSTCData\Builds\omi\develop\1.4.0-4\Linux_ULINUX_1.0_x64_64_Release\openssl_1.0.0 Python script provider: \\redmond\wsscfs\OSTCData\Builds\omiscriptprovider\develop\1.1.1-18

Repro steps:

  1. register provider1
  2. register provider 2
  3. Enumerate provider1 with localhost pass
  4. Enumerate provider2 with localhost pass
  5. Enumerate provider1 with localhost fail
root@py64-cent7-01  # /opt/omi/bin/omicli ei root/cimv2 Hosts
instance of Hosts
{
    FQDN=localhost localhost.localdomain localhost4 localhost4.localdomain4
    IPAddress=127.0.0.1
}
instance of Hosts
{
    FQDN=localhost localhost.localdomain localhost6 localhost6.localdomain6
    IPAddress=::1
}

~ 
root@py64-cent7-01  # /opt/omi/bin/omicli ei  root/cimv2 OMI_Datetime
instance of OMI_Datetime
{
    Key=Dates
    timestamp=1993-04-25T02:43:17.117405Z
    interval=P926DT6H59M59.236864S
    timestamps={1968-03-24T03:28:10.109198Z, 1940-02-09T19:47:38.410294Z, 1965-05-08T15:30:51.308553Z, 1948-09-04T15:18:47.541379Z}
    intervals={P57DT37H10M59.073292S, P893DT51H47M27.305474S, P530DT57H12M47.314663S, P427DT29H43M12.563559S}
    mixed={2011-11-25T01:52:29.156213Z, P724DT56H15M57.024335S, 1949-05-14T11:12:14.791517Z, P613DT57H39M29.774801S, 1938-11-12T11:24:58.387507Z, 2004-06-16T21:19:04.848601Z, P265DT54H13M42.719885S, P835DT18H11M19.208838S}
} 
You have new mail in /var/spool/mail/root

~
root@py64-cent7-01  # /opt/omi/bin/omicli ei root/cimv2 Hosts
/opt/omi/bin/omicli: result: MI_RESULT_FAILED
/opt/omi/bin/omicli: result: A general error occurred, not covered by a more specific error code.
Wayenyin commented 7 years ago

This Python provider's issue happens on following supported platforms. ENV

Build

JumpingYang001 commented 7 years ago

here is some log:

[15623,15623] INFO: null(0): EventId=40032 Priority=INFO Selector_AddHandler: selector=0x5b27f8, handler=0x5b2760, name=PROVMGR_TIMEOUT_MANAGER
2017/10/22 22:44:31 [15623,15623] INFO: null(0): EventId=40003 Priority=INFO agent started; fd 8
2017/10/22 22:44:31 [15623,15623] INFO: null(0): EventId=40011 Priority=INFO done with receiving msg(0xde9268:15:BinProtocolNotification:1)
2017/10/22 22:44:31 [15623,15623] INFO: null(0): EventId=40011 Priority=INFO done with receiving msg(0xdea018:4099:EnumerateInstancesReq:2)
2017/10/22 22:44:32 [15623,15623] INFO: null(0): EventId=40011 Priority=INFO done with receiving msg(0xdedad8:4099:EnumerateInstancesReq:3)
2017/10/22 22:44:34 [15623,15623] INFO: null(0): EventId=40011 Priority=INFO done with receiving msg(0xdedad8:4099:EnumerateInstancesReq:4)
2017/10/22 22:44:34 [15623,15623] WARNING: null(0): EventId=30066 Priority=WARNING failed to open the provider xyz_frog for class XYZ_Frog
2017/10/22 22:44:34 [15623,15623] ERROR: null(0): EventId=20001 Priority=ERROR Agent _RequestCallback: ProvMgr_NewRequest failed with result 1 !
EMumau commented 7 years ago

The error appears to be within MI_Datetime processing. I am investigating.

JumpingYang001 commented 7 years ago

@EMumau , from log "2017/10/22 22:44:34 [15623,15623] INFO: null(0): EventId=40011 Priority=INFO done with receiving msg(0xdedad8:4099:EnumerateInstancesReq:4) 2017/10/22 22:44:34 [15623,15623] WARNING: null(0): EventId=30066 Priority=WARNING failed to open the provider xyz_frog for class XYZ_Frog", it shows prov value is null here in the second query for Hosts or OMI_Datetime: https://github.com/Microsoft/omi/blob/7008e85c03deaa2510417b2672791c986de99841/Unix/provmgr/provmgr.c#L502

You might need to debug into the function _OpenProvider(lib, cn, request); to check the issue. I am not sure this issue is related to my fix #52 or not.

JumpingYang001 commented 7 years ago

I guess the omiagent can't find python class XYZ_Frog in the second query for Hosts or OMI_Datetime here: https://github.com/Microsoft/omi/blob/7008e85c03deaa2510417b2672791c986de99841/Unix/provmgr/provmgr.c#L394

394 p->classDecl = SchemaDecl_FindClassDecl(self->module->schemaDecl, 
395             className); 
JumpingYang001 commented 7 years ago

We have tried c/c++ provider with OMI doesn't have this issue, so it should be python provider issue.

JumpingYang001 commented 7 years ago

one possible reason is that self->module->schemaDecl is changed after querying the 2nd provider, then it cannot get the 1st provider module again, I mean the 2nd self->module overwrite the 1st self->module. then after you query the 1st provider again, it shows the library exist but can't find the class, because the module now is the 2nd provider module, it(the library) really exist for _OpenLibrary(self, proventry); which only verify p->libraryName but not p->module.

JumpingYang001 commented 7 years ago

For why querying 1 provider will always pass, 1 provider has p->classDecl->name in the cache, so it pass:

365     for (p = self->head; p; p = p->next) 
366     { 
367         if (Tcscasecmp(p->classDecl->name, className) == 0) 
368         { 
369             Provider_Addref(p); 
370             return p; 
371         } 
372     } 

When we try to query 2 providers, the 2nd time to query the 1st provider it shows the p->classDecl->name doesn't compare equal to className, that should be the problem we need to investigate why it isn't in the cache now.

JumpingYang001 commented 7 years ago

Possible this release the schemaDecl object (just remove this code might fix the issue): https://github.com/Microsoft/omi-script-provider/blob/fa3db167ec0401f8df0b7674b7d7065fad5a256c/provider/server_protocol.cpp#L1704

be called here: https://github.com/Microsoft/omi-script-provider/blob/9e815b269dea1a5d4785485acba58020e189ce78/provider/mi_main.cpp#L79

be called here: https://github.com/Microsoft/omi/blob/7008e85c03deaa2510417b2672791c986de99841/Unix/provmgr/provmgr.c#L279

JumpingYang001 commented 7 years ago

just guess but I am not sure.

JumpingYang001 commented 7 years ago

How can we enable SCX_BOOKEND debug in production environment? then I can gdb attach in production environment to see the python logs. hope we can provide this. that's why I filed #50.

EMumau commented 7 years ago

I have determined the problem with subsequent queries while running multiple providers during the period before the OMI Server times them out and unloads them. I am working on a solution.

SCX_BOOKEND is not a production tool. It is a development tool for use in a lab. It is not intended for external use. Changelist #45 altered several components that affect SCX_BOOKEND. Those changes essentially broke the SCX_BOOKEND functionality. I am restoring that functionality in my current work.

Please be patient. I know that you are trying to help. In this case, I have full understanding of the issue causing this and I am implementing a fix.

JumpingYang001 commented 7 years ago

@EMumau , got it, thanks. :)

JumpingYang001 commented 7 years ago

will verify it today.

JumpingYang001 commented 7 years ago

Verified on \redmond\wsscfs\OSTCData\Builds\omi\develop\1.4.0-37 \redmond\wsscfs\OSTCData\Builds\omiscriptprovider\develop\1.1.1-35 , it is fixed and close it.