Closed CarlinLiao closed 1 year ago
Not directly related, but have you been getting errors when running force-rebuild-data-loaded-images
? Could be a stale version of the test databases running properly with my local tests but not accurately reflecting the newest changes to the db.
11.14 Fetched 101 MB in 10s (10.4 MB/s)
11.14 E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/m/mysql-8.0/libmysqlclient21_8.0.34-0ubuntu0.22.04.1_amd64.deb 404 Not Found [IP: 185.125.190.39 80]
11.14 E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/m/mysql-8.0/libmysqlclient-dev_8.0.34-0ubuntu0.22.04.1_amd64.deb 404 Not Found [IP: 185.125.190.39 80]
11.14 E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
------
development_prereqs.Dockerfile:28
--------------------
26 | COPY pyproject.toml.unversioned .
27 | RUN python -m pip install toml
28 | >>> RUN apt install libgdal-dev -y
29 | RUN python -c 'import toml; c = toml.load("pyproject.toml.unversioned"); print("\n".join(c["project"]["dependencies"]))' | python -m pip install -r /dev/stdin
30 | RUN python -c 'import toml; c = toml.load("pyproject.toml.unversioned"); print("\n".join(c["project"]["optional-dependencies"]["all"]))' | python -m pip install -r /dev/stdin
--------------------
ERROR: failed to solve: process "/bin/sh -c apt install libgdal-dev -y" did not complete successfully: exit code: 100
0.917 After this operation, 13.2 MB of additional disk space will be used.
0.917 Get:1 http://deb.debian.org/debian-security bookworm-security/main amd64 libssl-dev amd64 3.0.11-1~deb12u2 [2,430 kB]
1.189 Err:2 http://apt.postgresql.org/pub/repos/apt bookworm-pgdg/main amd64 libpq-dev amd64 16.0-1.pgdg120+1
1.189 404 Not Found [IP: 217.196.149.55 80]
1.192 E: Failed to fetch http://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-16/libpq-dev_16.0-1.pgdg120%2b1_amd64.deb 404 Not Found [IP: 217.196.149.55 80]
1.192 E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
1.192 Fetched 2,430 kB in 0s (7,762 kB/s)
------
Dockerfile:9
--------------------
7 | RUN apt install python3-venv -y
8 | RUN apt install python3-pip -y
9 | >>> RUN apt install -y libpq-dev
10 | RUN apt install -y libgdal-dev
11 | RUN python3 -m pip install --break-system-packages psycopg2==2.9.6
--------------------
ERROR: failed to solve: process "/bin/sh -c apt install -y libpq-dev" did not complete successfully: exit code: 100
Regarding force-rebuild-data-loaded-images
:
This does not error for me, but I think I know what is going on.
When apt install
fails with HTTP 404, it typically means that apt update
has not been run in a while so the list of package servers is outdated (this is what the suggestion with the error message says). In the internal universe of this docker image, apt update
might have been run a long time ago, since the RUN apt update
layer is probably cached.
Yesterday Grigoriy and I added libgdal-dev
as an explicit dependency in a few places because some upstream python package dependencies of squidpy failed to install automatically under certain environmental conditions, which we chalked up to ARM architecture. This is why there is now a new layer detected in your docker build context, serving as the trigger for appearance of this error in your development environment.
It is relatively easy to overcome this issue on a one-time basis by adding --no-cache
to the docker build
commands temporarily.
However, it would waste a lot of time if --no-cache
was used for every build. The frequency of update of the package servers is not that high, so it is not normally necessary. Every couple of weeks I use make clean-docker-images
to start fresh, and this may be the best practice for now. Note that issue #231 will hopefully relieve us permanently from the use of docker in the development context (docker for tests will be done remotely), making obsolete many issues like this one.
For the feature matrix extraction error, I think a little more information is needed.
The error says that at the time of creating the pandas DataFrame for the expression matrix, the row format of the provided rows does not match the stipulated column format. Thus the row format, the column format, or both are incorrect. We should try to determine which of these 3 possibilities is occurring, by independently determining the correct expected format for the rows and the correct expected format for the columns.
It is most likely a bug introduced by PR #230 , which we should try to fix.
Will try reloading the docker images.
scstudies is a deprecated name, it was the former name of the monolothic database inside the database cluster which contained all of our studies. My question was really about which database "cluster" is searched (i.e. where is the running instance of the postgresql server).
(I now think that the answer is the RDS database, but you can confirm.)
It is the RDS database (unless there are multiple RDS databases, of course).
The debugging procedure I'm using is to install a local build of spatialprofilingtoolbox wheel from main (i.e. what is produced in dist/
after make development-image
), then running an attempted reproduce snippet:
from spatialprofilingtoolbox.db.feature_matrix_extractor import FeatureMatrixExtractor
extractor = FeatureMatrixExtractor('.spt_db.config.aws')
x = extractor.extract(specimen='lesion 0_1')
I am able to reproduce this issue, so the strategy is working. I am now checking the rows and columns, etc.
The rows being provided to the DataFrame constructor have many different lengths, including what seems to be the correct length of 28 (26 channels plus 2 for pixel coordinates), which matches the column format.
Here are some bespoke error messages customized to this issue:
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:198: Unexpected length 36:
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:199: [5290.0, 6.0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1]
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:198: Unexpected length 44:
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:199: [5295.0, 62.0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1]
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:198: Unexpected length 52:
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:199: [5360.0, 285.0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1]
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:198: Unexpected length 36:
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:199: [5283.0, 3091.0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1]
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:198: Unexpected length 36:
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:199: [4280.0, 2949.0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1]
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:198: Unexpected length 40:
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:199: [3892.0, 2947.0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1]
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:198: Unexpected length 36:
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:199: [4438.0, 2955.0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1]
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:198: Unexpected length 36:
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:199: [4579.0, 2965.0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1]
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:198: Unexpected length 34:
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:199: [4100.0, 2953.0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:198: Unexpected length 34:
11-09 15:49:52 [ DEBUG ] db.feature_matrix_extractor:199: [4136.0, 2952.0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
I added an "error guard" against this inconsistency, in branch issue243
.
The binary-encoded expression vectors in these ints
are sometimes erroneous, overflowing the expected number of channels (e.g. 26 in this case).
These are originating in cache files on a persistent volume, which files I am having trouble deleting and refreshing. After manual deletion they seem to "rise from the dead" and come right back. So, most likely there are outdated cache files here from who knows when. I'm stilll working on this.
Wait what I said is not right, FeatureMatrixExtractor
is retrieving direct from the database (I was confusing this with the ondemand service).
(To answer your question, it does not pertain to removal of indexing because that has not taken place yet.)
It seems that at some point I accidentally partially uploaded a dataset twice. This isn't supposed to be possible, so that's a bug. I'm seeing 52 ( = 26 * 2 ) expression values in many cases that are supposed to be 26.
Github automatically closed this issue with #245, even though I explicitly stated that that PR does not close this issue. Weird.
After #245 running the cggnn workflow on RDS "Melanoma intalesional IL2" worked without error, but "Urothelial ICI" throws a similar error.
[ DEBUG ] workflow.common.sparse_matrix_puller:377: Received 34123866 sparse entries total from DB.
Traceback (most recent call last):
File "/home/liaoc2/miniconda3/envs/spt_cggnn/lib/python3.11/site-packages/spatialprofilingtoolbox/cggnn/scripts/run.py", line 184, in <module>
df_cell, df_label, label_to_result = extract_cggnn_data(
^^^^^^^^^^^^^^^^^^^
File "/home/liaoc2/miniconda3/envs/spt_cggnn/lib/python3.11/site-packages/spatialprofilingtoolbox/cggnn/extract.py", line 130, in extract_cggnn_data
df_cell = _create_cell_df({
^
File "/home/liaoc2/miniconda3/envs/spt_cggnn/lib/python3.11/site-packages/spatialprofilingtoolbox/cggnn/extract.py", line 131, in <dictcomp>
specimen: extractor.extract(specimen=specimen, retain_structure_id=True)[specimen].dataframe
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/liaoc2/miniconda3/envs/spt_cggnn/lib/python3.11/site-packages/spatialprofilingtoolbox/db/feature_matrix_extractor.py", line 80, in extract
extraction = self._extract(
^^^^^^^^^^^^^^
File "/home/liaoc2/miniconda3/envs/spt_cggnn/lib/python3.11/site-packages/spatialprofilingtoolbox/db/feature_matrix_extractor.py", line 98, in _extract
data_arrays = self._retrieve_expressions_from_database(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/liaoc2/miniconda3/envs/spt_cggnn/lib/python3.11/site-packages/spatialprofilingtoolbox/db/feature_matrix_extractor.py", line 129, in _retrieve_expressions_from_database
puller.pull(
File "/home/liaoc2/miniconda3/envs/spt_cggnn/lib/python3.11/site-packages/spatialprofilingtoolbox/workflow/common/sparse_matrix_puller.py", line 235, in pull
self._retrieve_data_arrays(
File "/home/liaoc2/miniconda3/envs/spt_cggnn/lib/python3.11/site-packages/spatialprofilingtoolbox/workflow/common/sparse_matrix_puller.py", line 255, in _retrieve_data_arrays
self._fill_data_arrays_for_study(
File "/home/liaoc2/miniconda3/envs/spt_cggnn/lib/python3.11/site-packages/spatialprofilingtoolbox/workflow/common/sparse_matrix_puller.py", line 286, in _fill_data_arrays_for_study
parsed = parse(sparse_entries, _specimen, continuous_also=continuous_also)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/liaoc2/miniconda3/envs/spt_cggnn/lib/python3.11/site-packages/spatialprofilingtoolbox/workflow/common/sparse_matrix_puller.py", line 462, in _parse_data_arrays_by_specimen
self._check_targets(list(df_group['target']), target_index_lookup)
File "/home/liaoc2/miniconda3/envs/spt_cggnn/lib/python3.11/site-packages/spatialprofilingtoolbox/workflow/common/sparse_matrix_puller.py", line 476, in _check_targets
raise ValueError(f'Got {len(targets)} expression values for some cell, expected {len(target_index_lookup)} or fewer.')
ValueError: Got 42 expression values for some cell, expected 14 or fewer.
The relation to the SPT codebase is marginal, the dataset integrity is the issue. Yesterday I re-uploaded the melanoma dataset being careful not to accidentally interrupt/restart any import operations. But I haven't done the others yet.
The long amount of time required to do these dataset management tasks is why I am prioritizing so highly issues like #222 and #226 . I want to unblock the other work.
This issue was reproduced and then fixed by cleaning the datasets in the db. Tested provisionally in live db.
Thank you for resolving the Nextflow script to Bash issue. Resolving this has re-revealed a problem I noticed earlier.
It's interesting because this error doesn't occur when the cggnn workflow test, so my first thought would be a difference between the test database and
scstudies
. Does this have to do with the removal of indexing you mentioned earlier this week @jimmymathews?Originally posted by @CarlinLiao in https://github.com/nadeemlab/SPT/issues/241#issuecomment-1802251462