Closed MinuraPunchihewa closed 1 week ago
Hey @noklam, I have to test this a little more thoroughly, but can you give me your opinion on the approach taken here?
Hey @noklam,
I've now had the opportunity to test out these changes and they seem to work fine. I've tested both ManagedTableDataset
and ExternalTableDataset
with the reduced dependencies without any issues.
Some more comments I left but I only notice it wasn't sent properly 😅
Haha no problem. I've made the suggested improvements to the type hints, including a couple more involving DBUtils
.
Thanks for this contribution @MinuraPunchihewa ! ⭐ Can you update the release notes and add your change + your name to contributors?
Thanks, @merelcht. I've just updated the release notes.
Description
This PR a
_utils
sub-package to house modules with common utility functions that are used across Spark-based datasets. This avoids the need forpyspark
to be installed for datasets that will run on Databricks.Fixes https://github.com/kedro-org/kedro-plugins/issues/849
Development notes
The new
_utils
package organized the utility functions in three main modules,Additional modules can be added to this sub-package to house code that is used in multiple datasets.
These changes have been tested,
ManagedTableDataset
andExternalTableDataset
.Checklist
RELEASE.md
file