Open ddeflyer opened 9 years ago
it would be helpful if you could paste the workload xml, corresponding log/system.log and workload.log. Also, if allowed, you could have a try on latest v0.4.1.0 beta1.
workload-config.xml:
<?xml version="1.0" encoding="UTF-8"?>
workload-log.txt is 1.3 MB so I've uploaded it and the xml to a shared google drive at: https://drive.google.com/folderview?id=0B1exXslkMeyJczRLaTdSSWdXeFU&usp=sharing
I can't run the test again for a few weeks as the cluster is in use for other purposes. Once the cluster is free'ed up I will give it a try.
Understood. We will look at your log to see if any clues.
From: ddeflyer [mailto:notifications@github.com] Sent: Thursday, December 18, 2014 10:32 AM To: intel-cloud/cosbench Cc: Wang, Yaguang Subject: Re: [cosbench] Drivers (s3) get stuck when a large number of failures are seen (#228)
I can't run the test again for a few weeks as the cluster is in use for other purposes. Once the cluster is free'ed up I will give it a try.
— Reply to this email directly or view it on GitHubhttps://github.com/intel-cloud/cosbench/issues/228#issuecomment-67432918.
Ok, I have access to the cluster for a few days. I am going to try running the test in a bit. If there is anything else that I should try or data to gather then please let me know before too long as I might not have access after a few days.
In a highly degenerate case where hardware is overwhelmed and failing many writes, we are seeing one or more drivers getting stuck past the time based termination point for a test. These drivers are not doing any work, they are just sitting idle but not finished.
This is observed in 4.0.0.0 using the s3 connector. Please let me know what information would be useful to track down the issue.