Open M-Pixel opened 3 years ago
Hello. For ECLIBEC63D1
you need 9 rawx services. The internal load balancer won't accept to send 2 chunks on the same service. Look for "no service polled from" in the logs of oioswift-proxy-server
, oio-proxy
or oio-meta2-server
to confirm.
This can indeed be found in my /var/log/oio/sds/OPENIO/oioproxy-0/oioproxy-0.log
file:
warning 1341 42D4 log WRN oio.core no service polled from [rawx], 3/9 services polled
In retrospect, the warning makes sense, however it could definitely be more clear ("Request for 9 services from pool [rawx] could not be fulfilled because only 3 services currently exist in that pool"). In addition to the warning, I would hope that at least one of the services would log an error. It does make sense that this particular one is a warning (client asks for impossible thing != server malfunction). However, by comparison, when I had configured a non-existent storage policy, I did get an explicit error logged from one of the services (I don't remember which).
ISSUE TYPE
COMPONENT NAME
oioswift (maybe?)
SDS VERSION
CONFIGURATION
OS / ENVIRONMENT
SUMMARY
When trying to upload a file that is large enough to match an erasure-code storage policy, a 503 error is returned.
I confirmed that files small enough to match a simple (replication) storage policy do not experience the same issue. I confirmed that the issue is not unique to a particular EC implementation (ISA-L vs libEC).
I could not identify any relevant diagnostic information in any of the logs that I thought might be relevant. Before going through the effort of providing an exact repro, I was hoping I could get guidance on how I can more precisely identify the problem.
I do have one suspicion: My cluster has 3 RAWX at the moment. Perhaps the inability to locate 6 unique RAWX for
ECLIBEC63D1
's 6 data chunks is causing a timeout that results in the 503?