treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data
https://docs.lakefs.io
Apache License 2.0
4.47k stars 359 forks source link

[Feature Request]: Support additional protocols syntax for Azure imports, e.g. `abfss://` #6131

Open MigQ2 opened 1 year ago

MigQ2 commented 1 year ago

Currently importing from Azure Blob storage or ADLS only supports paths using https:// protocol.

However, for many Azure applications, the abfss:// protocol is preferred and has a different syntax (or even wasbs://)

It would be great if it could be possible to use the abfss:// syntax for creating repositories, underlying Azure paths, ot import paths.

Even if lakefs still uses https:// underneath, it would be very convenient to be able to copy-paste an abfss:// path into the import UI or in lakectl and not having to edit it manually, as it's not just changing the protocol prefix, the path syntax is quite different.

Maybe a path translator functionality from abfss:// to https:// could make this work easily.

The only thing I can think of that could give problems is the adls prefix needed for ADLS Gen2 imports, which I think is non-standard Azure syntax and I don't know if that could be explicitly expressed in an abfss:// path

image

N-o-Z commented 1 year ago

@MigQ2 Thanks for the suggestion. We will definitely look into supporting the abfs[s] scheme. Please note the wasb was deprecated by Azure - so we are not planning on supporting this scheme. Regarding the important note you added about the import hint in lakeFS - from the Azure documentation if I understand correctly, this scheme is only available for ADLS Gen2. If that's the case, we can assume that when given this scheme we are always dealing with an ADLS Gen2 account

idanovo commented 1 year ago

@ozkatz please prioritize

github-actions[bot] commented 1 year ago

This issue is now marked as stale after 90 days of inactivity, and will be closed soon. To keep it, mark it with the "no stale" label.

N-o-Z commented 1 year ago

@talSofer @ozkatz needs prioritization