laminlabs / lamindb

A data framework for biology.
https://docs.lamin.ai
Apache License 2.0
105 stars 7 forks source link

✨ Added support for different join types in `QuerySet.df()` #1709

Closed insavchuk closed 1 week ago

insavchuk commented 2 weeks ago

The current 'include' parameter for QuerySet.df does not allow outer joins. In the context of our project (getting all the Biosamples and FeatureSets for a given Artifact) we need some more flexibility to allow the outer join. Therefore I propose the following changes to the .df method in the lnschema_core and lamindb repos (tested and working for our use case, joining Artifact with FeatureSet and Biosamples)

falexwolf commented 2 weeks ago

Oh, wow! Awesome! Just: can you please add a test for this?

Somewhere around here: https://github.com/laminlabs/lamindb/blob/2d32fd46fd7905b287f2ae972121630c1b33edab/tests/test_queryset.py#L19-L27

And you should bump the lnschema-core git submodule within lamindb so that it points to your changes there: https://github.com/laminlabs/lnschema-core/pull/392

falexwolf commented 2 weeks ago

@Koncopd, can you help Ivan to get tests to pass?

falexwolf commented 2 weeks ago

Or better, make a "CONTRIBUTING.md" doc that explains how to set up & install things to outsiders?

The important thing is to also explain how to checkout and install submodules.

falexwolf commented 2 weeks ago

The current error is a formatting error. Ivan, you need to run the below so that the linter is live:

pre-commit install
Koncopd commented 2 weeks ago

Yep, i will add the guide.

insavchuk commented 2 weeks ago

Thanks @falexwolf for bringing up these points! I tried to reformat the code with ruff but have not been able to figure out yet how to inspect the formatting unfortunately. I also added unit tests in a new commit/bumped to the changed lnschema-core.

falexwolf commented 2 weeks ago

This is how you do it, Ivan:

(py310) falexwolf@mbpalex lamindb % pre-commit run --all-files
prettier.................................................................Passed
nbstripout...............................................................Passed
ruff.....................................................................Passed
ruff-format..............................................................Failed
- hook id: ruff-format
- files were modified by this hook

1 file reformatted, 88 files left unchanged

detect private key.......................................................Passed
check python ast.........................................................Passed
fix end of files.........................................................Passed
mixed line ending........................................................Passed
trim trailing whitespace.................................................Passed
check for case conflicts.................................................Passed
mypy.....................................................................
Passed

(py310) falexwolf@mbpalex lamindb % git diff
diff --git a/lamindb/_query_set.py b/lamindb/_query_set.py
index aee1957c..829b6b9d 100644
--- a/lamindb/_query_set.py
+++ b/lamindb/_query_set.py
@@ -97,7 +97,9 @@ class QuerySet(models.QuerySet, CanValidate):
     """

     @doc_args(Registry.df.__doc__)
-    def df(self, include: str | list[str] | None = None, join: str = "inner") -> pd.DataFrame:
+    def df(
+        self, include: str | list[str] | None = None, join: str = "inner"
+    ) -> pd.DataFrame:
         """{}."""
         # re-order the columns
         exclude_field_names = ["created_at"]
Submodule sub/lnschema-core c12081d0..1f64e123 (rewind):
  < Added support for different join types in QuerySet.df()
Submodule sub/wetlab 88613bea..d7aea9ab:
  > ✅ Add system check

(py310) falexwolf@mbpalex lamindb % git add .
(py310) falexwolf@mbpalex lamindb % git commit -m "Fix linting"
prettier.................................................................Passed
nbstripout...........................................(no files to check)Skipped
ruff.....................................................................Passed
ruff-format..............................................................Passed
detect private key.......................................................Passed
check python ast.........................................................Passed
fix end of files.........................................................Passed
mixed line ending........................................................Passed
trim trailing whitespace.................................................Passed
check for case conflicts.................................................Passed
mypy.....................................................................Passed
? Choose a gitmoji: ♻️  - Refactor code.
? Enter the commit title [11/48]: Fix linting
? Enter the commit message: 
[savchuki/dev 70aaf936] ♻️ Fix linting
 3 files changed, 5 insertions(+), 3 deletions(-)

I pushed these changes directly here on the branch.

falexwolf commented 1 week ago

I'm just seeing that you added the test! Thank you! Will merge this after polishing.

github-actions[bot] commented 1 week ago

🚀 Deployed on https://667c00cc7bb2cc32d79997ec--lamindb-qnwk.netlify.app

codecov[bot] commented 1 week ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 92.10%. Comparing base (74636ae) to head (a364a17). Report is 13 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1709 +/- ## ========================================== + Coverage 92.06% 92.10% +0.03% ========================================== Files 48 50 +2 Lines 5485 5496 +11 ========================================== + Hits 5050 5062 +12 + Misses 435 434 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.