Closed JumpingYang001 closed 7 years ago
This Python provider's issue happens on following supported platforms. ENV
Build
here is some log:
[15623,15623] INFO: null(0): EventId=40032 Priority=INFO Selector_AddHandler: selector=0x5b27f8, handler=0x5b2760, name=PROVMGR_TIMEOUT_MANAGER
2017/10/22 22:44:31 [15623,15623] INFO: null(0): EventId=40003 Priority=INFO agent started; fd 8
2017/10/22 22:44:31 [15623,15623] INFO: null(0): EventId=40011 Priority=INFO done with receiving msg(0xde9268:15:BinProtocolNotification:1)
2017/10/22 22:44:31 [15623,15623] INFO: null(0): EventId=40011 Priority=INFO done with receiving msg(0xdea018:4099:EnumerateInstancesReq:2)
2017/10/22 22:44:32 [15623,15623] INFO: null(0): EventId=40011 Priority=INFO done with receiving msg(0xdedad8:4099:EnumerateInstancesReq:3)
2017/10/22 22:44:34 [15623,15623] INFO: null(0): EventId=40011 Priority=INFO done with receiving msg(0xdedad8:4099:EnumerateInstancesReq:4)
2017/10/22 22:44:34 [15623,15623] WARNING: null(0): EventId=30066 Priority=WARNING failed to open the provider xyz_frog for class XYZ_Frog
2017/10/22 22:44:34 [15623,15623] ERROR: null(0): EventId=20001 Priority=ERROR Agent _RequestCallback: ProvMgr_NewRequest failed with result 1 !
The error appears to be within MI_Datetime processing. I am investigating.
@EMumau , from log "2017/10/22 22:44:34 [15623,15623] INFO: null(0): EventId=40011 Priority=INFO done with receiving msg(0xdedad8:4099:EnumerateInstancesReq:4) 2017/10/22 22:44:34 [15623,15623] WARNING: null(0): EventId=30066 Priority=WARNING failed to open the provider xyz_frog for class XYZ_Frog", it shows prov value is null here in the second query for Hosts or OMI_Datetime: https://github.com/Microsoft/omi/blob/7008e85c03deaa2510417b2672791c986de99841/Unix/provmgr/provmgr.c#L502
You might need to debug into the function _OpenProvider(lib, cn, request); to check the issue. I am not sure this issue is related to my fix #52 or not.
I guess the omiagent can't find python class XYZ_Frog in the second query for Hosts or OMI_Datetime here: https://github.com/Microsoft/omi/blob/7008e85c03deaa2510417b2672791c986de99841/Unix/provmgr/provmgr.c#L394
394 p->classDecl = SchemaDecl_FindClassDecl(self->module->schemaDecl,
395 className);
We have tried c/c++ provider with OMI doesn't have this issue, so it should be python provider issue.
one possible reason is that self->module->schemaDecl is changed after querying the 2nd provider, then it cannot get the 1st provider module again, I mean the 2nd self->module overwrite the 1st self->module.
then after you query the 1st provider again, it shows the library exist but can't find the class, because the module now is the 2nd provider module, it(the library) really exist for _OpenLibrary(self, proventry);
which only verify p->libraryName but not p->module.
For why querying 1 provider will always pass, 1 provider has p->classDecl->name in the cache, so it pass:
365 for (p = self->head; p; p = p->next)
366 {
367 if (Tcscasecmp(p->classDecl->name, className) == 0)
368 {
369 Provider_Addref(p);
370 return p;
371 }
372 }
When we try to query 2 providers, the 2nd time to query the 1st provider it shows the p->classDecl->name doesn't compare equal to className, that should be the problem we need to investigate why it isn't in the cache now.
Possible this release the schemaDecl object (just remove this code might fix the issue): https://github.com/Microsoft/omi-script-provider/blob/fa3db167ec0401f8df0b7674b7d7065fad5a256c/provider/server_protocol.cpp#L1704
be called here: https://github.com/Microsoft/omi-script-provider/blob/9e815b269dea1a5d4785485acba58020e189ce78/provider/mi_main.cpp#L79
be called here: https://github.com/Microsoft/omi/blob/7008e85c03deaa2510417b2672791c986de99841/Unix/provmgr/provmgr.c#L279
just guess but I am not sure.
How can we enable SCX_BOOKEND debug in production environment? then I can gdb attach in production environment to see the python logs. hope we can provide this. that's why I filed #50.
I have determined the problem with subsequent queries while running multiple providers during the period before the OMI Server times them out and unloads them. I am working on a solution.
SCX_BOOKEND is not a production tool. It is a development tool for use in a lab. It is not intended for external use. Changelist #45 altered several components that affect SCX_BOOKEND. Those changes essentially broke the SCX_BOOKEND functionality. I am restoring that functionality in my current work.
Please be patient. I know that you are trying to help. In this case, I have full understanding of the issue causing this and I am implementing a fix.
@EMumau , got it, thanks. :)
will verify it today.
Verified on \redmond\wsscfs\OSTCData\Builds\omi\develop\1.4.0-37 \redmond\wsscfs\OSTCData\Builds\omiscriptprovider\develop\1.1.1-35 , it is fixed and close it.
Currently we just check CentOS7x64 has this issue, will check more OS and add more information for this tomorrow. Build: omi path:
\\redmond\wsscfs\OSTCData\Builds\omi\develop\1.4.0-4\Linux_ULINUX_1.0_x64_64_Release\openssl_1.0.0
Python script provider:\\redmond\wsscfs\OSTCData\Builds\omiscriptprovider\develop\1.1.1-18
Repro steps: