Closed MinuraPunchihewa closed 1 month ago
Hey @noklam, @ankatiyar,
I've added the ExternalTableDataset
as an experimental dataset here. Is there anything else I need to do here, like tests etc.?
@MinuraPunchihewa Tests are super welcomed, although we don't have a databricks cluster to run these tests unless there are ways to run this locally.
@MinuraPunchihewa Tests are super welcomed, although we don't have a databricks cluster to run these tests unless there are ways to run this locally.
@noklam I just added some tests to cover the logic specific to ExternalTableDataset
(outside of BaseTableDataset
). I also made a few other changes to the existing tstssuch as removing some duplicate code and moving all of the fixtures to conftest.py
.
FAILED tests/databricks/test_base_table_dataset.py::TestBaseTableDataset::test_save_overwrite
Is this an expected fail?
FAILED tests/databricks/test_base_table_dataset.py::TestBaseTableDataset::test_save_overwrite
Is this an expected fail?
@noklam It has been my experience that when overwriting data to an external table of a format other than Delta, the location has to be provided. Since I am an external location fixture here which requires an environment variable called DATABRICKS_EXTERNAL_LOCATION
that does not exist in the test environment, yes, this is expected to fail.
I should have specified a different format though. I just made a commit with that change.
Can you give me an explanation of how these tests are running? There should be a Spark environment available for them to run at all, right?
Since this is an experimental dataset, test will go to here: https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets/kedro_datasets_experimental/tests
This mean tests are not going to be run per every commit in CI. If you look at other Spark tests we run them in a local mode. With Databricks it's unclear whether it possible to run it in a local mode, especially with the platform UnityCatalog (which is different from the open source version).
Since this is an experimental dataset, test will go to here: https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets/kedro_datasets_experimental/tests
This mean tests are not going to be run per every commit in CI. If you look at other Spark tests we run them in a local mode. With Databricks it's unclear whether it possible to run it in a local mode, especially with the platform UnityCatalog (which is different from the open source version).
Got it. I just moved the tests; I had to copy the conftest.py
from as well.
tests are now passing and looks good to me.
@merelcht I've made the requested changes. Do let me know if I've updated the RELEASE document correctly.
@merelcht I've made the requested changes. Do let me know if I've updated the RELEASE document correctly.
I've moved it to the section for upcoming release, because 5.1.0
is already out 🙂 Thank again! I'll make sure this gets merged.
Oh actually @MinuraPunchihewa I just realised this dataset wasn't added to https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/pyproject.toml#L176 or https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/pyproject.toml#L285. Can you update the pyproject.toml
with the necessary dependencies if any?
Oh actually @MinuraPunchihewa I just realised this dataset wasn't added to https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/pyproject.toml#L176 or https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/pyproject.toml#L285. Can you update the
pyproject.toml
with the necessary dependencies if any?
Also here: https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/docs/source/api/kedro_datasets_experimental.rst :)
Hey guys, Apologies, I've missed this. I've made the changes now. @merelcht @ankatiyar
Description
This PR adds the
ExternalTableDataset
to support interactions with external tables in Databricks (Unity Catalog).Fixes https://github.com/kedro-org/kedro-plugins/issues/817
Development notes
The ExternalTableDataset has been implemented by extending the
BaseTableDataset
that was added here.These changes have been tested,
Checklist
RELEASE.md
file