Open vivekrnv opened 2 years ago
@prgeor Can you please provide an ETA for the fix?
@dgsudharsan there is an inherent issue where mlnx platform make several ethool command call via process call that make sfputil much slower in mlnx platform. Do you still see the issue after this fix
@dgsudharsan there is an inherent issue where mlnx platform make several ethool command call via process call that make sfputil much slower in mlnx platform. Do you still see the issue after this fix
That fix significantly reduces the response time but the current approach still involves making multiple file open and read calls. I think SfpBase and the others can be optimized to reduce read_eeprom calls.
@andywongarista lets discuss the fix for this SFP-refactor introduced issue
Description
The inefficiency is in SfpBase, xcvr_mem_maps etc and this also affects xcvrd, since both xcvrd and sfputil use the same api's of SfpBase such as
get_transceiver_info & get_transceiver_bulk_status & get_transceiver_threshold_info
. On a device with 30 front-panel ports and 30 QSFP-DD xcvrs, i've seen pmon CPU usage reaching upto 35% with a period of 10-20 sec. pmon usage can get progressively worse if we have multiple front panel portsSteps to reproduce the issue:
Describe the results you received:
In comparison:
Triage
A single get_transciever_info() is resulting in 31 calls to read_eeprom and the read_eeprom for a lot of platforms uses either a subprocess call or a file open/read operations. Thus making it extremely slow. Calling get_transciever_domI() can result in an addition of 40+ calls to read eeprom. Note: These stats were taken for MSN4700 platform
SfpBase, Xcvr_Api, MemMap and the associated classed must be optimized. Ideal optimization target should be to drastically reduce calls to read_eeprom.