intel-cloud / cosbench

a benchmark tool for cloud object storage service
Other
573 stars 242 forks source link

It appears that Cosbench does not properly close driver sockets. #275

Closed ywang19 closed 9 years ago

ywang19 commented 9 years ago

For some reason (Sometimes, not always!) despite kernel settings the kernel fails to recycle disconnected sockets too (I guess this is because of java garbage collector does not destroy the local socket objects). At some point the server ends up with 28717 sockets it CLOSE_WAIT state and fails to create any additional connection, the driver is marked offline and the story stops there.

mratner commented 9 years ago

I've been constantly experiencing this issue with 0.4.2.c2 and the latest 0.4.2.0, on both RHEL 7.1 and Ubuntu 14.04. In my case, the number of connections in CLOSE_WAIT state would gradually increase, in increments of 4, up to a little over 4000, at which point the java process runs out of open file descriptor limit (4096) and throws scores of "java.net.SocketException: Too many open files" errors into libs.log file. After that there is some kind of reset and the same process happens again.

Sounds like this can severely impact the test results... IMHO this issue is pretty critical. Is there anything I can try? I'm willing to experiment with it in my local Eclipse instance... would appreciate any pointers.

ywang19 commented 9 years ago

What storage system you are experimenting?

mratner commented 9 years ago

Hitachi Content Platform object store (Hitachi Content Platform Fundamentals) using Amazon S3 interface. I also need to mention that this issue occurs in a idle state, i.e. when no test is running, right after I [re]start COSBench processes.

ywang19 commented 9 years ago

Thanks for the clue. But really quite strange to see the increase on CLOSE_WAIT even at idle, will look at it.

Thanks, -yaguang

From: mratner [mailto:notifications@github.com] Sent: Tuesday, August 04, 2015 6:19 PM To: intel-cloud/cosbench Cc: Wang, Yaguang Subject: Re: [cosbench] It appears that Cosbench does not properly close driver sockets. (#275)

Hitachi Content Platform object store (Hitachi Content Platform Fundamentalshttps://www.hds.com/assets/pdf/hitachi-white-paper-introduction-to-object-storage-and-hcp.pdf) using Amazon S3 interface. I also need to mention that this issue occurs in a idle state, i.e. when no test is running, right after I [re]start COSBench processes.

— Reply to this email directly or view it on GitHubhttps://github.com/intel-cloud/cosbench/issues/275#issuecomment-127557421.

ywang19 commented 9 years ago

see upcoming 0.4.2.c3 for fixing.

mratner commented 9 years ago

Hi Yaguang,

Just wondering - this issue was closed but I haven't seen any fix for it. 0.4.2.c3 doesn't seem to contain any commits pertaining to this problem.

I presume you weren't able to reproduce the issue in your environment? If so, could you please confirm and also let me know what environment you tested it in - perhaps I can try this too.

As I mentioned earlier, I'm experiencing the issue in two different scenarios (RHEL 7.1 and Ubuntu Ubuntu 14.04), with identical symptoms and very consistently (happening all the time), so I'm very curious as to what you have to say about this.

Thanks, -Michael

ywang19 commented 9 years ago

Hi Michael,

This is an issue caused by heartbeat, previous logic doesn’t close socket, also some logic errors, and #279 is also related to the issue.

Below commits are for the fix:

regards, -yaguang

From: mratner [mailto:notifications@github.com] Sent: Thursday, August 13, 2015 1:24 PM To: intel-cloud/cosbench Cc: Wang, Yaguang Subject: Re: [cosbench] It appears that Cosbench does not properly close driver sockets. (#275)

Hi Yaguang,

Just wondering - this issue was closed but I haven't seen any fix for it. 0.4.2.c3 doesn't seem to contain any commits pertaining to this problem.

I presume you weren't able to reproduce the issue in your environment? If so, could you please confirm and also let me know what environment you tested it in - perhaps I can try this too.

As I mentioned earlier, I'm experiencing the issue in two different scenarios (RHEL 7.1 and Ubuntu Ubuntu 14.04), with identical symptoms and very consistently (happening all the time), so I'm very curious as to what you have to say about this.

Thanks, -Michael

— Reply to this email directly or view it on GitHubhttps://github.com/intel-cloud/cosbench/issues/275#issuecomment-130539219.

mratner commented 9 years ago

Awesome, thanks. Just to confirm - the fixes were pushed to 0.4.2.0, 0.4.2.c3 doesn't include them, correct?

ywang19 commented 9 years ago

The fixes were pushed to 0.4.2.0 branch, and 0.4.2.c3 exactly included them☺.

From: mratner [mailto:notifications@github.com] Sent: Thursday, August 13, 2015 3:14 PM To: intel-cloud/cosbench Cc: Wang, Yaguang Subject: Re: [cosbench] It appears that Cosbench does not properly close driver sockets. (#275)

Awesome, thanks. Just to confirm - the fixes were pushed to 0.4.2.0, 0.4.2.c3 doesn't include them, correct?

— Reply to this email directly or view it on GitHubhttps://github.com/intel-cloud/cosbench/issues/275#issuecomment-130561429.

mratner commented 9 years ago

Sorry...I was looking at the 0.4.2.c3 tag that didn't show any commits, just as 0.4.2.c3 source didn't include any updates. I can see now that the actual build zip is updated. This is confusing but I guess that's only because I'm new to GitHub... I'll learn :-) Going to test the new build now... thanks!

mratner commented 9 years ago

Confirming - the issue I've seen before is gone.

ywang19 commented 9 years ago

Cool! Thanks for your confirmation.

From: mratner [mailto:notifications@github.com] Sent: Wednesday, August 19, 2015 6:59 AM To: intel-cloud/cosbench Cc: Wang, Yaguang Subject: Re: [cosbench] It appears that Cosbench does not properly close driver sockets. (#275)

Confirming - the issue I've seen before is gone.

— Reply to this email directly or view it on GitHubhttps://github.com/intel-cloud/cosbench/issues/275#issuecomment-132385110.