tenzir / public-roadmap

The public roadmap of Tenzir
https://docs.tenzir.com/roadmap
4 stars 0 forks source link

ADLS Connector #82

Open mavam opened 9 months ago

mavam commented 9 months ago

Along the lines of our connectors for S3 and GCS, an abfs connector would provide the ability to Azure's Data Lake Storage (ADLS). An ADLS URI looks as follows:

abfs[s]://<file_system>@<account_name>.dfs.core.windows.net/<path>/<file_name>

Microsoft provides a C++ client SDK that we can leverage. Prior to investigating add this dependency, we should investigate whether going via HDFS is in fact sufficient. Apache Arrow provides HDFS support, but it appears that ADSL support isn't that smooth yet. Vaex went through something similar. There's also a Python-native lift of ADLS via PyArrow.

tobim commented 9 months ago

IIRC this is actively being worked on in Arrow: https://github.com/apache/arrow/issues/18014#issuecomment-1700666053. I suggest we wait for that unless the item becomes urgent.

mavam commented 8 months ago

I user mentioned that this feature would be more important to them than AWS S3.