openhpi2 / Open-HPI

Open HPI is an open source implementation of the SA Forum's Hardware Platform Interface (HPI). HPI provides an abstracted interface to managing computer hardware, typically for chassis and rack based servers
Other
3 stars 1 forks source link

accept() failed in openhpi daemon #2535

Closed openhpi2 closed 8 years ago

openhpi2 commented 10 years ago

When openhpi service is restarted, the below errors are observed in syslog: "openhpid: transport: strmsock.cpp:558: accept failed." and "openhpid: transport: strmsock.cpp:296: error while sending message"

The error messages go away if attached changes are applied to the sources.

Attachments: some.diff.txt

Reported by: vladimir-pavlov

openhpi2 commented 10 years ago

Original comment by: dr_mohan

openhpi2 commented 10 years ago

The above error messages seems to be coming from different location than where the code changes are. Some of the error messages are not showing up when the process is killed. There are few questions on the patch. Bugs are usually fixed on the trunk.

  1. When the accept call failed, it used to give an error message and break out of the loop. Now it does not give the error message and continuous with the loop after a sleep. Why? Could the error message be avoided the first time? Could we continue only few times and break out of the loop?
  2. Generally, trying to understand what the patch is trying to accomplish
  3. The vaiable, cStreamSock::eWaitCc wc, in server.cpp is not getting used.
  4. If the error message is appearing only once but continuing without problem, is there a way to avoid the error message only the first time.

Original comment by: dr_mohan

openhpi2 commented 10 years ago

I wouldn't like call my changes in the source as "patch". It is rather workaround. The problem is persistent appearance of the error messages during openhpid restart. So I need patch which doesn't show error messages only when openhpi daemon is stopping. To manage it I added error recognition in cStreamSock::ReadMsg() from strmsock.cpp if peer closed connection in ahead of server finished reading (via payload_len and such decision is an ugly hack). Also I removed critical error if the one appears because of timeout. Timeout error processing was moved to server.cpp. I hope it is answer for first, second and partly fourth questions. Regarding question 3: O, yes, in service_thread(). I forgot to delete unused declaration. But in oh_server_run() this variable is used.

Original comment by: vladimir-pavlov

openhpi2 commented 10 years ago

1) Tested the patch by running the clients on different machine rather than where the openhpid is running. Without the patch we get the two errors transport: CRIT: strmsock.cpp:558: accept failed. and transport: CRIT: strmsock.cpp:296: error while sending message. 2) When I applied the patch and tested it, the first error(transport: CRIT: strmsock.cpp:558: accept failed) goes off but the second one(transport: CRIT: strmsock.cpp:296: error while sending message) appears sometime. Second one does not appear all the time but it comes sometime. 3) Variable 'wc' is declared in service_thread() but not used in this function, we can remove this declaration from the patch. 4) The message [CRIT( "select failed" );] has been removed , I feel we can retain this one.

Original comment by: openhpi2

openhpi2 commented 10 years ago

We will checkin the attached patch, if it is ok. Please review 1834_July_04_2014.patch.txt

Original comment by: dr_mohan

openhpi2 commented 10 years ago

Thanks for the patch! Several comments and questions:

o) Just whining: let's have {} even for a single line if construct!

o) Are we trying to handle a transient accept error?

o) What if a client drops connection attempt between Wait and Accept? Looks like a race condition.

o) "Select failed" logging in server.cpp may be confusing. We call sock->Wait() and wait potentially can be implemented with another mechanism.

o) Assigning "payload_len = (uint32_t)(-1)" looks dangerous. Can we assign it to zero instead?

Original comment by: avpak

openhpi2 commented 10 years ago

Thanks Anton for reviewing the patch and comments.

We will take of care of first and last point in the patch.

For second point, Indeed Vladimir is trying to avoid two error messages as described in the bug.

Regarding third point, we have not seen failure of Accept() call so far. We will have a check with Vladimir if he has faced any time Accept() call failure. And in case Aceept() call does not fail there will be no race condition. In case of Accept() call fails, could you please suggest us that either should we continue in the loop or break out of the loop.

On the fourth point, could you please give us some idea on implementation of different mechanism for Wait() instead of calling sock->Wait() and wait?

Original comment by: openhpi2

openhpi2 commented 10 years ago

Anton,

The attached is the modified patch for the defect. Looks like the higher level error message could be completely removed. We have not seen the accept call failure though 1834_accept_failed_jul_29.patch.txt

Original comment by: dr_mohan

openhpi2 commented 10 years ago

Mohan, thanks!

My points:

Original comment by: avpak

openhpi2 commented 10 years ago

Anton,

Please find the attached modified patch which has addressed most of your above mentioned points. 1834_accept_failed_aug_14.patch.txt

Original comment by: openhpi2

openhpi2 commented 10 years ago

Thanks! I have only few points:

Original comment by: avpak

openhpi2 commented 10 years ago

Anton,

I have attached the modified patch(which addresses all of your above points). 1834_accept_failed_aug_22.patch.txt

Original comment by: openhpi2

openhpi2 commented 10 years ago

Cool! Now I have only two comments.

1) please use spaces instead of tabs for indentation in "Error accepting server socket." block.

2) In the block: g_usleep( 1000000 ); if (stop) { break; }

let's have "stop" check first and then g_usleep.

Original comment by: avpak

openhpi2 commented 10 years ago

Thanks Anton for comments.

I have addressed the above two points in the attached patch. 1834_accept_failed_aug_25.patch.txt

Original comment by: openhpi2

openhpi2 commented 10 years ago

Great! Looks perfect for me.

Original comment by: avpak

openhpi2 commented 10 years ago

Original comment by: hemanthreddy

openhpi2 commented 10 years ago

Fixed in revision 7588.

Original comment by: hemanthreddy

openhpi2 commented 10 years ago

Original comment by: dr_mohan