Open mavam opened 9 months ago
IIRC this is actively being worked on in Arrow: https://github.com/apache/arrow/issues/18014#issuecomment-1700666053. I suggest we wait for that unless the item becomes urgent.
I user mentioned that this feature would be more important to them than AWS S3.
Along the lines of our connectors for S3 and GCS, an
abfs
connector would provide the ability to Azure's Data Lake Storage (ADLS). An ADLS URI looks as follows:Microsoft provides a C++ client SDK that we can leverage. Prior to investigating add this dependency, we should investigate whether going via HDFS is in fact sufficient. Apache Arrow provides HDFS support, but it appears that ADSL support isn't that smooth yet. Vaex went through something similar. There's also a Python-native lift of ADLS via PyArrow.