Open FredrikBakken opened 2 months ago
I've noticed it too lately and would much appreciate this change ❤️
Hey @FredrikBakken, thanks for the suggestion, and apologies for the delayed response.
I’m in favour of this and am happy to implement it in the near future if no one else gets around to it.
Sorry for the delay getting back on this one. Haven't had much time to work on this lately but I'm ok with this approach since spark is often a provided dependency in many other implementations.
@mitchstockdale I'll have some time this week to do this and want to bundle in another change raised but if you get around to it first let me know and I'll cut a new release.
Thanks for the suggestions and appreciate the patience!
@mitchelllisle - I haven't had a look at this yet but I was thinking about it over the last week or so, and have an additional proposal to enhance it #536
Hi 👋
We are currently experimenting with using
sparkdantic
on our Spark schema definitions in our pipelines inside Databricks. However, based on our current configuration, we are bound to installing all dependencies inside notebook scopes - rather than installing dependencies on the cluster-level. This means we need to run the!pip install
command for each dependency used in the beginning of our notebooks.We've noticed that the
sparkdantic
installation is taking a lot of time to install as it also installspyspark
as part of its dependencies, even thoughspark
is already available inside the Databricks environment. A potential solution for this is to movepyspark
to become an optional dependency, rather than a mandatory dependency.Any thoughts on this suggestion?