laminlabs / lamindb

A data framework for biology.
https://docs.lamin.ai
Apache License 2.0
129 stars 12 forks source link

Loading dataset from a different instance that does not have all schemas of the current instance errors #2117

Closed Zethson closed 2 weeks ago

Zethson commented 3 weeks ago

Report

lamin init --storage ./test-perturbation --schema bionty,wetlab,findrefs

ln.track("HIRTYxL3aZc70000")

adata = ln.Artifact.using("laminlabs/lamindata").get(uid="Xk7Qaik9vBLV4PKf0001").load()

results in

{
    "name": "ProgrammingError",
    "message": "relation \"findrefs_reference\" does not exist
LINE 1: SELECT 1 AS \"a\" FROM \"findrefs_reference\" INNER JOIN \"findre...
                             ^
",
    "stack": "---------------------------------------------------------------------------
UndefinedTable                            Traceback (most recent call last)
File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/backends/utils.py:105, in CursorWrapper._execute(self, sql, params, *ignored_wrapper_args)
    104 else:
--> 105     return self.cursor.execute(sql, params)

UndefinedTable: relation \"findrefs_reference\" does not exist
LINE 1: SELECT 1 AS \"a\" FROM \"findrefs_reference\" INNER JOIN \"findre...
                             ^

The above exception was the direct cause of the following exception:

ProgrammingError                          Traceback (most recent call last)
Cell In[5], line 1
----> 1 adata = ln.Artifact.using(\"laminlabs/lamindata\").get(uid=\"Xk7Qaik9vBLV4PKf0001\").load()
      2 adata.obs.head(3)

File ~/PycharmProjects/lamindb/lamindb/_artifact.py:994, in load(self, is_run_input, **kwargs)
    992     access_memory = load_to_memory(cache_path, **kwargs)
    993 # only call if load is successfull
--> 994 _track_run_input(self, is_run_input)
    995 return access_memory

File ~/PycharmProjects/lamindb/lamindb/core/_data.py:465, in _track_run_input(data, is_run_input, run)
    458             is_valid = True
    459         return (
    460             data.run_id != run.id
    461             and not data._state.adding  # this seems duplicated with data._state.db is None
    462             and is_valid
    463         )
--> 465     input_data = [data for data in data_iter if is_valid_input(data)]
    466     input_data_ids = [data.id for data in input_data]
    467 if input_data:

File ~/PycharmProjects/lamindb/lamindb/core/_data.py:457, in _track_run_input.<locals>.is_valid_input(data)
    450 else:
    451     # record is on another db
    452     # we have to save the record into the current db with
    453     # the run being attached to a transfer transform
    454     logger.important(
    455         f\"completing transfer to track {data.__class__.__name__}('{data.uid[:8]}') as input\"
    456     )
--> 457     data.save()
    458     is_valid = True
    459 return (
    460     data.run_id != run.id
    461     and not data._state.adding  # this seems duplicated with data._state.db is None
    462     and is_valid
    463 )

File ~/PycharmProjects/lamindb/lamindb/_artifact.py:1107, in save(self, upload, **kwargs)
   1104     # ensure that the artifact is uploaded
   1105     self._to_store = True
-> 1107 self._save_skip_storage(**kwargs)
   1109 from lamindb._save import check_and_attempt_clearing, check_and_attempt_upload
   1111 using_key = None

File ~/PycharmProjects/lamindb/lamindb/_artifact.py:1138, in _save_skip_storage(file, **kwargs)
   1136 def _save_skip_storage(file, **kwargs) -> None:
   1137     save_feature_sets(file)
-> 1138     super(Artifact, file).save(**kwargs)
   1139     save_feature_set_links(file)

File ~/PycharmProjects/lamindb/lamindb/_record.py:618, in save(self, *args, **kwargs)
    616     self_on_db.features = FeatureManager(self_on_db)
    617     self.features._add_from(self_on_db, transfer_logs=transfer_logs)
--> 618     self.labels.add_from(self_on_db, transfer_logs=transfer_logs)
    619 for k, v in transfer_logs.items():
    620     if k != \"run\":

File ~/PycharmProjects/lamindb/lamindb/core/_label_manager.py:208, in LabelManager.add_from(self, data, transfer_logs)
    204 for related_name, (_, labels) in get_labels_as_dict(
    205     data, instance=self._host._state.db
    206 ).items():
    207     labels = labels.all()
--> 208     if not labels.exists():
    209         continue
    210     # look for features

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/models/query.py:1288, in QuerySet.exists(self)
   1284 \"\"\"
   1285 Return True if the QuerySet would have any results, False otherwise.
   1286 \"\"\"
   1287 if self._result_cache is None:
-> 1288     return self.query.has_results(using=self.db)
   1289 return bool(self._result_cache)

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/models/sql/query.py:660, in Query.has_results(self, using)
    658 q = self.exists(using)
    659 compiler = q.get_compiler(using=using)
--> 660 return compiler.has_results()

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/models/sql/compiler.py:1542, in SQLCompiler.has_results(self)
   1537 def has_results(self):
   1538     \"\"\"
   1539     Backends (e.g. NoSQL) can override this in order to use optimized
   1540     versions of \"query has any results.\"
   1541     \"\"\"
-> 1542     return bool(self.execute_sql(SINGLE))

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/models/sql/compiler.py:1574, in SQLCompiler.execute_sql(self, result_type, chunked_fetch, chunk_size)
   1572     cursor = self.connection.cursor()
   1573 try:
-> 1574     cursor.execute(sql, params)
   1575 except Exception:
   1576     # Might fail for server-side cursors (e.g. connection closed)
   1577     cursor.close()

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/backends/utils.py:79, in CursorWrapper.execute(self, sql, params)
     78 def execute(self, sql, params=None):
---> 79     return self._execute_with_wrappers(
     80         sql, params, many=False, executor=self._execute
     81     )

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/backends/utils.py:92, in CursorWrapper._execute_with_wrappers(self, sql, params, many, executor)
     90 for wrapper in reversed(self.db.execute_wrappers):
     91     executor = functools.partial(wrapper, executor)
---> 92 return executor(sql, params, many, context)

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/backends/utils.py:100, in CursorWrapper._execute(self, sql, params, *ignored_wrapper_args)
     98     warnings.warn(self.APPS_NOT_READY_WARNING_MSG, category=RuntimeWarning)
     99 self.db.validate_no_broken_transaction()
--> 100 with self.db.wrap_database_errors:
    101     if params is None:
    102         # params default might be backend specific.
    103         return self.cursor.execute(sql)

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/utils.py:91, in DatabaseErrorWrapper.__exit__(self, exc_type, exc_value, traceback)
     89 if dj_exc_type not in (DataError, IntegrityError):
     90     self.wrapper.errors_occurred = True
---> 91 raise dj_exc_value.with_traceback(traceback) from exc_value

File ~/miniconda3/envs/pertpy/lib/python3.12/site-packages/django/db/backends/utils.py:105, in CursorWrapper._execute(self, sql, params, *ignored_wrapper_args)
    103     return self.cursor.execute(sql)
    104 else:
--> 105     return self.cursor.execute(sql, params)

ProgrammingError: relation \"findrefs_reference\" does not exist
LINE 1: SELECT 1 AS \"a\" FROM \"findrefs_reference\" INNER JOIN \"findre...
                             ^
"
}

The get works but the load() errors. Only when track() is on.

Version information

No response

sunnyosun commented 2 weeks ago

Do you mean if ln.track() wasn't run, load() works?

Zethson commented 2 weeks ago

Correct

sunnyosun commented 2 weeks ago

@falexwolf I think this is related to the inter-instance tracking, why is the artifact being saved in this case?

falexwolf commented 2 weeks ago

You can only load an artifact if it's saved; otherwise, there is no way to track lineage.

The bug here is independent of data lineage but related to not being able to save the artifact. I thought we're meanwhile able to transfer artifacts across instances with mismatching schemas? I'm surprised this doesn't work.

I know this is hard to test but we should add a test for a target instance whose schema modules is neither a strict super nor a strict subset of the source instance.

falexwolf commented 2 weeks ago

Looking at the below line in the traceback I believe we in fact don't have a general problem, just a coverage problem for edge cases:

208     if not labels.exists():

Likely, this case isn't covered in the tests and this leads to the bug.

sunnyosun commented 2 weeks ago

But saving an artifact works without ln.track(), so I think the issue is in the tracking.

sunnyosun commented 2 weeks ago

Should be fixed here and added tests: https://github.com/laminlabs/lamindb/pull/2132