stackabletech / hdfs-operator

Apache Hadoop HDFS operator for Stackable
Other
40 stars 4 forks source link

Improve data locality by considering Kubernetes topology #595

Open lfrancke opened 2 weeks ago

lfrancke commented 2 weeks ago

Description

As users of the HDFS operator and a Stackable deployed HDFS we want it to ensure data locality by talking to a DataNode on the same Kubernetes node as the client first if one exists.

Value

HDFS tries to store the first copy of a block on a "local" machine before shipping data to remote machines over the network. This relies on a simple IP address comparison in the HDFS code which breaks due to the nature of Kubernetes where pods don't share the same IP even if they are on the same Kubernetes node.

I believe we can improve this situation by changing the HDFS code to consider the Kubernetes node while looking for a "local" machine.

We already have precedent with the hdfs-topology-provider which does something similar. I believe we can plug this logic into the chooseLocalOrFavoredStorage method of BlockPlacementPolicyDefault.

We want this because it will probably benefit all workloads that are using HDFS and locally attached storage and that are using things like Spark or HBase where processing can happen on the same Kubernetes node as the storage. The benefit is going to be less network traffic and a boost in performance.

Dependencies

It probably makes sense to reuse code from the hdfs-topology-provider project.

Tasks

Acceptance Criteria

Release Notes

The HDFS NameNodes will now look at the Kubernetes topology when considering whether a client request is made locally or not. This means it will consider all clients "local" that are hosted on the same Kubernetes node as a DataNode.

adwk67 commented 1 week ago

Summary of topology provider logic

The StackableTopologyProvider implements DNSToSwitchMapping class uses the entry point public List<String> resolve(List<String> names) to return a topology for a given pod name/IP.

The topology is in the form: /{resolved-label-1}/{resolved-label-2}/.... For example, in the integration test the labels:

rackAwareness:
  - nodeLabel: kubernetes.io/hostname
  - podLabel: app.kubernetes.io/role-group

are used to check that a topology has been created for a specific dataNode role group. Internally, StackableTopologyProvider first of all resolves the name (which could be an IP or a pod name) to an IP and maps that to labels locating the same kubernetes node where that pod is running.

This steps are as follows (ignoring caching logic):

We can use the information here 🟢 for this ticket.