Closed jkobject closed 4 months ago
the path seems to already exist in "/home/ml4ig1/.cache/lamindb/"
@falexwolf would you know how to solve this?
Also sometime I would like to directly access an h5ad file in the cache but sc.read_h5ad doesn't seem to work on these h5ads...
Are you sure you didn't upload on the files? What is the full error text?
Yes, we need the full traceback, Jeremie.
sc.read_h5ad
should work on the cached h5ads, they are unchanged.
Yes I figure I did something wrong.. I have that only for a dozen files. Here is the full stack trace:
IntegrityError Traceback (most recent call last)
File ~/miniconda3/envs/scprint/lib/python3.10/site-packages/django/db/utils.py:98, in DatabaseErrorWrapper.__ca
ll__.<locals>.inner(*args, **kwargs)
97 with self:
---> 98 return func(*args, **kwargs)
IntegrityError: FOREIGN KEY constraint failed
The above exception was the direct cause of the following exception:
IntegrityError Traceback (most recent call last)
Cell In[58], line 1
----> 1 preprocessed_dataset = do_preprocess(cx_dataset, start_at=104)
File ~/Documents code/scPRINT/scprint/dataset/preprocess.py:177, in Preprocessor.__call__(self, data, name, description, start_at)
170 print("the old file is already in the local")
171 myfile = ln.File(
172 adata,
173 is_new_version_of=ln.File.filter(uid=file.uid)[0],
174 description="preprocessed by scprint",
175 )
--> 177 myfile.save()
178 files.append(myfile)
179 dataset = ln.Dataset(files, name=name, description=description)
File ~/miniconda3/envs/scprint/lib/python3.10/site-packages/lamindb/_file.py:922, in save(self, *args, **kwargs)
921 def save(self, *args, **kwargs) -> None:
--> 922 self._save_skip_storage(*args, **kwargs)
923 from lamindb._save import check_and_attempt_clearing, check_and_attempt_upload
925 exception = check_and_attempt_upload(self)
File ~/miniconda3/envs/scprint/lib/python3.10/site-packages/lamindb/_file.py:936, in _save_skip_storage(file, *args, **kwargs)
934 def _save_skip_storage(file, *args, **kwargs) -> None:
935 save_feature_sets(file)
--> 936 super(File, file).save(*args, **kwargs)
937 save_feature_set_links(file)
File ~/miniconda3/envs/scprint/lib/python3.10/site-packages/lamindb/_registry.py:465, in save(self, *args, **kwargs)
463 init_self_from_db(self, result)
464 else:
--> 465 super(Registry, self).save(*args, **kwargs)
466 if db is not None and db != "default":
467 if hasattr(self, "labels"):
File ~/miniconda3/envs/scprint/lib/python3.10/site-packages/django/db/models/base.py:814, in Model.save(self, force_insert, force_update, using, update_fields)
811 if loaded_fields:
812 update_fields = frozenset(loaded_fields)
--> 814 self.save_base(
815 using=using,
816 force_insert=force_insert,
817 force_update=force_update,
818 update_fields=update_fields,
819 )
File ~/miniconda3/envs/scprint/lib/python3.10/site-packages/django/db/models/base.py:877, in Model.save_base(self, raw, force_insert, force_update, using, update_fields)
875 if not raw:
876 parent_inserted = self._save_parents(cls, using, update_fields)
--> 877 updated = self._save_table(
878 raw,
879 cls,
880 force_insert or parent_inserted,
881 force_update,
882 using,
883 update_fields,
884 )
885 # Store the database on which the object was saved
886 self._state.db = using
File ~/miniconda3/envs/scprint/lib/python3.10/site-packages/django/db/models/base.py:1020, in Model._save_table
(self, raw, cls, force_insert, force_update, using, update_fields)
1017 fields = [f for f in fields if f is not meta.auto_field]
1019 returning_fields = meta.db_returning_fields
-> 1020 results = self._do_insert(
1021 cls._base_manager, using, fields, returning_fields, raw
1022 )
1023 if results:
1024 for value, field in zip(results[0], returning_fields):
File ~/miniconda3/envs/scprint/lib/python3.10/site-packages/django/db/models/base.py:1061, in Model._do_insert(self, manager, using, fields, returning_fields, raw)
1056 def _do_insert(self, manager, using, fields, returning_fields, raw):
1057 """
1058 Do an INSERT. If returning_fields is defined then this method should
1059 return the newly created data for the model.
1060 """
-> 1061 return manager._insert(
1062 [self],
1063 fields=fields,
1064 returning_fields=returning_fields,
1065 using=using,
1066 raw=raw,
1067 )
File ~/miniconda3/envs/scprint/lib/python3.10/site-packages/django/db/models/manager.py:87, in BaseManager._get_queryset_methods.<locals>.create_method.<locals>.manager_method(self, *args, **kwargs)
85 @wraps(method)
86 def manager_method(self, *args, **kwargs):
---> 87 return getattr(self.get_queryset(), name)(*args, **kwargs)
File ~/miniconda3/envs/scprint/lib/python3.10/site-packages/django/db/models/query.py:1805, in QuerySet._insert(self, objs, fields, returning_fields, raw, using, on_conflict, update_fields, unique_fields)
1798 query = sql.InsertQuery(
1799 self.model,
1800 on_conflict=on_conflict,
1801 update_fields=update_fields,
1802 unique_fields=unique_fields,
1803 )
1804 query.insert_values(fields, objs, raw=raw)
-> 1805 return query.get_compiler(using=using).execute_sql(returning_fields)
File ~/miniconda3/envs/scprint/lib/python3.10/site-packages/django/db/models/sql/compiler.py:1833, in SQLInsertCompiler.execute_sql(self, returning_fields)
1830 elif self.connection.features.can_return_columns_from_insert:
1831 assert len(self.query.objs) == 1
1832 rows = [
-> 1833 self.connection.ops.fetch_returned_insert_columns(
1834 cursor,
1835 self.returning_params,
1836 )
1837 ]
1838 else:
1839 rows = [
1840 (
1841 self.connection.ops.last_insert_id(
(...)
1846 )
1847 ]
File ~/miniconda3/envs/scprint/lib/python3.10/site-packages/django/db/backends/base/operations.py:213, in BaseDatabaseOperations.fetch_returned_insert_columns(self, cursor, returning_params)
208 def fetch_returned_insert_columns(self, cursor, returning_params):
209 """
210 Given a cursor object that has just performed an INSERT...RETURNING
211 statement into a table, return the newly created data.
212 """
--> 213 return cursor.fetchone()
File ~/miniconda3/envs/scprint/lib/python3.10/site-packages/django/db/utils.py:97, in DatabaseErrorWrapper.__call__.<locals>.inner(*args, **kwargs)
96 def inner(*args, **kwargs):
---> 97 with self:
98 return func(*args, **kwargs)
File ~/miniconda3/envs/scprint/lib/python3.10/site-packages/django/db/utils.py:91, in DatabaseErrorWrapper.__exit__(self, exc_type, exc_value, traceback)
89 if dj_exc_type not in (DataError, IntegrityError):
90 self.wrapper.errors_occurred = True
---> 91 raise dj_exc_value.with_traceback(traceback) from exc_value
File ~/miniconda3/envs/scprint/lib/python3.10/site-packages/django/db/utils.py:98, in DatabaseErrorWrapper.__call__.<locals>.inner(*args, **kwargs)
96 def inner(*args, **kwargs):
97 with self:
---> 98 return func(*args, **kwargs)
IntegrityError: FOREIGN KEY constraint failed
the files are in .cache but not in my instance folder (INSTANCENAME/.lamindb/files.h5ad
)
So, you'll get a foreign key error if you haven't yet saved a dependent record. Because it errors directly on file save, it gotta be a storage, a user, a transform or a run. The File record doesn't depend on anything else. 🤔
I would say that it is not safe to save the same file from different processes. What happened probably is several files were corrupted when writing to the same cache from memory.
I am getting files from cellxgene's instance and processing them, then creating a file record and saving them locally on my instance. I had hoped to do it in parallel, giving different chunks to each but it seems that dataset.file.all()
returns a list with a different order in different processes...
If it were a corruption I would have expected only one file to be corrupted but for now I have around 5 problematic files. meaning that after I process them, I do a file.save
, and it gives off this error...
But it is not for all my files. I have restarted a run and for example, files 1,2,3 gave an issue, the 4,5 worked, then 6 gave an issue. Now I am doing 7.
The files that gave an issue are in ~/.cache/lamin
but like all lamin saved h5ads I cannot open them with scanpy...
@falexwolf this looks like a problem with inter-instance transfer.
dataset.file.all()
This returns a QuerySet
, which isn't ordered.
If it were a corruption I would have expected only one file to be corrupted but for now I have around 5 problematic files. meaning that after I process them, I do a file.save, and it gives off this error... But it is not for all my files. I have restarted a run and for example, files 1,2,3 gave an issue, the 4,5 worked, then 6 gave an issue. Now I am doing 7. The files that gave an issue are in ~/.cache/lamin but like all lamin saved h5ads I cannot open them with scanpy...
I agree with Sergei that this seems all due to inter-instance transfer.
The foreign key issue is likely due to a bug that we haven't somehow covered in tests.
Could you privately share the script that you're running to transfer the data? I'll debug it.
h5ads I cannot open them with scanpy...
I don't understand this one as we don't do anything to the h5ads, but I'm happy to debug.
I don't understand this one as we don't do anything to the h5ads, but I'm happy to debug.
Here is the issue when loading an h5ad with scanpy..
Let me know if you're free to get on a call - ping me on Slack! This looks like a file not found error. 😅
Here is the code:
if do_cache:
for i in ln.File.filter(description=MYDESC):
all_ready_processed_keys.add(i.initial_version.key)
for i, file in enumerate(cx_dataset.files.all()):
# use the counts matrix
print(i)
if file.key in all_ready_processed_keys:
print(f"{file.key} is already processed")
continue
print(file)
if file.backed().obs.is_primary_data.sum() == 0:
print(f"{file.key} only contains non primary cells")
continue
adata = file.load(stream=True)
print(adata)
adata = some_preprocess(adata)
myfile = ln.File(
adata,
is_new_version_of=file,
description=MYDESC,
)
myfile.save()
files.append(myfile)
dataset = ln.Dataset(files, name=NAME, description=DESC)
dataset.save()
I'm pretty sure we simply forgot to close this.
Getting this error when trying to save a file
IntegrityError: FOREIGN KEY constraint failed
seems like I had initially 2 processes that were saving files and I got this error.
Now I stopped one of them but still get it