kedro-org / kedro-plugins

First-party plugins maintained by the Kedro team.
Apache License 2.0
94 stars 90 forks source link

fix(datasets): share the cache of Ibis connections #941

Open deepyaman opened 4 days ago

deepyaman commented 4 days ago

Description

Close #935

Development notes

The ConnectionMixin can also be used by something like pandas.SQLQueryDataset and pandas.SQLTableDataset to share a single connection cache. While this has some value (prevent storing the connections twice, if the user is using both datasets), it's less functionally important as in this specific Ibis case.

Checklist

noklam commented 3 days ago

I haven't looked into this at all, but my intuition is that it stems from #842 (comment); because the connections are different, temporary tables are not shared.

I can try to look into creating a shared cache, but no other dataset follows this pattern (they are pretty independent the way they're currently designed); not sure if @merelcht @astrojuanlu @idanov @noklam any of you may have thoughts on this. https://github.com/kedro-org/kedro-plugins/issues/935#issuecomment-2486299226

Quoting this for future reference. This is not just a refactoring, but necessary as temp table are only accessible by the same connection.

deepyaman commented 1 day ago

Non blocking comment from the peanut gallery: Are there other ways of doing this without using Mixins?

I think we should do less inheritance, not more. Let alone multiple inheritance.

I'm not 100% sure what the best alternative would be, and would need to do some reading/experimentation to figure it out. That said, I think this can definitely be refactored down the road, as it doesn't update the public interface.

Edit: https://python-patterns.guide/gang-of-four/composition-over-inheritance/ also has some ideas? But I haven't looked into it sufficiently to see what I may like.

deepyaman commented 1 day ago

Blocked by #949