Open PeteW opened 4 years ago
Unfortunately spark scheduler extender currently doesn't support launching client mode applications to kubernetes. It assumes that a driver will be launched in the cluster, which then proceeds to request executors.
That being said, I think your executor pods are failing to be scheduled before consulting with the extender though, as the message says 4 Insufficient pods
, which is kube-scheduler's way of telling you that all 4 nodes in your cluster are over their pod count limit.
If you have fixed that by increasing your pod limit or killing existing pods, then I would expect your pods to be still stuck at pending, but with a message telling you something like failed to get resource reservations
as it will be looking for the spaces that the driver reserved, which doesn't happen in client mode.
for your second question, I think it is got to do with a network problem from the health probe into your container, because the message for the stuck pod indicates that kube-scheduler considered that pod, hence is operational
Unfortunately spark scheduler extender currently doesn't support launching client mode applications to kubernetes. It assumes that a driver will be launched in the cluster, which then proceeds to request executors.
That nuance wasnt clear to me but now that it is I think I can work with this. Good to know thanks.
That being said, I think your executor pods are failing to be scheduled before consulting with the extender though, as the message says 4 Insufficient pods, which is kube-scheduler's way of telling you that all 4 nodes in your cluster are over their pod count limit. If you have fixed that by increasing your pod limit or killing existing pods, then I would expect your pods to be still stuck at pending, but with a message telling you something like failed to get resource reservations as it will be looking for the spaces that the driver reserved, which doesn't happen in client mode.
This is actually how AWS fargate works as a resource negotiator. Hardware is allocated on-demand, always one-node-per-pod. For example, say spark requests resources for a new executor. This of course begets a request to kubernetes for an executor pod. In the case of fargate this begets a request to allocate a new VM just-in-time for the lifetime of the executor, billed by the second. In 60-90 seconds (usually) fargate returns a new VM with kubernetes tooling pre-installed/configured sized to the request plus some extra RAM for kubelet.
When running kubectl get nodes
I can see the new node for the requested pod provisioned as expected. But there's something about this new node/VM the scheduler extender rejects. I can go into more detail, even a step-by-step demonstration if that helps elaborate. But the only key point I want to make is that there might be something different about this cloud-based node-allocation behavior which doesnt jive with the scheduler extender, at least not without customization.
for your second question, I think it is got to do with a network problem from the health probe into your container, because the message for the stuck pod indicates that kube-scheduler considered that pod, hence is operational
I dont have a good response for this point. Within the VLAN containing nodes there are no current restrictions for cross-node communication. It seems the "connection refused" errors come from requests where the client and server are the same IP. This might be an oversight I can find by looking closer.
I'm attempting to run spark-thriftserver using this scheduler extender. If you're not familiar, spark-thriftserver runs in client mode (local driver, remote executors). The thrift server exposes a JDBC connection which receives queries and turns these into spark jobs.
The command to run this looks like:
spark-defaults.conf looks like:
So far, I've applied the extender.yaml file as-is without any modifications. This instantiates two new pods under the spark namespace both in Running state with names starting with "spark-scheduler-".
describe pod XXX
yields some troubling information about them:When I attempt to run the driver above (which launches properly), because the
spark.dynamicAllocation.minExecutors
is set to1
the driver immediately requests a single executor pod at startup. The pod itself remains indefinitely in a pending state.describe pod XXX
seems to suggest that no nodes satisfy the pod's scheduling criteria:What I'm having trouble figuring out is:
instance-group
labels, nor any custom labels. All the nodes accept the spark namespace. sorry to ask but I am struggling to find the proper steps to take to narrow down the issue.If it helps, this is using aws fargate as the compute resources behind kubernetes, but based on what i know so far that shouldnt be an issue.