Closed dominiklohmann closed 5 months ago
It's important to note that HDFS is also a way to access Azure Data Lake Storage (ADLS). This issue is deeply linked to https://github.com/tenzir/public-roadmap/issues/82.
Hadoop has a dedicated module that exposes ADLS via a URL of the form adl://<Account Name>.azuredatalakestore.net/
. During the design of the hdfs
loader and connector, we should think about whether we want to provide an adl
shim that sets things up for a seamless ADLS experience through HDFS.
Note that support for Azure in Arrow's filesystem abstraction is currently being worked on. Not sure when it'll be there, but that may soon be an option as well.
We do not see a need for this currently.
Similar to our S3 and GCS Connectors, Apache Arrow comes with an HDFS Filesystem abstraction. We can utilize this to implement an
hdfs
connector.