lakehq / sail

LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
https://lakesail.com
Apache License 2.0
374 stars 11 forks source link

HDFS Support #173

Closed linhr closed 1 month ago

linhr commented 2 months ago

HDFS support may be possible via https://github.com/datafusion-contrib/hdfs-native-object-store.

skewballfox commented 2 months ago

should this be behind a feature flag and if so should this be enabled by default?

linhr commented 2 months ago

Yeah it's a good idea to control external system integrations via feature flags. For commonly used features such as HDFS, we may enable it by default so that the Python client can get this out-of-the-box.

skewballfox commented 2 months ago

open to PRs for this? Also, any relevant existing test I should be aware of? Kinda new to both the format and Spark (I'm used to jumping into unfamiliar projects though)

shehabgamin commented 2 months ago

open to PRs for this? Also, any relevant existing test I should be aware of? Kinda new to both the format and Spark (I'm used to jumping into unfamiliar projects though)

@skewballfox Absolutely, we'd love to see your PR for this! We're always open to community contributions, and your involvement would be greatly appreciated. If you'd like to contribute, feel free to comment "take" on this issue and link your draft PR to avoid any duplicated efforts.

For testing, I recommend checking the Developer Docs. If anything is unclear or if you need assistance getting set up, feel free to reach out—we're happy to help! https://docs.lakesail.com/sail/latest/development/

skewballfox commented 2 months ago

take

skewballfox commented 2 months ago

btw, I can make a separate draft PR for setting up a dev container for vscode, Having trouble getting it setup on my system but I'm not sure if that's because of a misconfiguration or a podman thing

shehabgamin commented 2 months ago

btw, I can make a separate draft PR for setting up a dev container for vscode, Having trouble getting it setup on my system but I'm not sure if that's because of a misconfiguration or a podman thing

That would be great! We haven't used the Dockerfile (if you happen to be trying to use it) in a very long time, so I would be surprised if it was still functional.