Closed Arkkienkeli closed 3 months ago
Turns out we had reported this issue internally but it was unresolved (ref: https://github.com/jump-cellpainting/aws/issues/75#issuecomment-1531518014). Thankfully the analysis
files are available so we can regenerate it.
Here are the steps to follow
https://cytomining.github.io/profiling-handbook/05-create-profiles.html#create-database-backend
PROJECT_NAME=cpg0016-jump
mkdir -p ~/ebs_tmp/${PROJECT_NAME}/workspace/software
cd ~/ebs_tmp/${PROJECT_NAME}/workspace/software
if [ -d pycytominer ]; then rm -rf pycytominer; fi
git clone https://github.com/cytomining/pycytominer.git
cd pycytominer
python3 -m pip install -e .[collate]
and then
BATCH_ID="JUMPCPE-20210730-Run14_20210731_000211"
PLATE="ATSJUM206"
python3 pycytominer/cyto_utils/collate_cmd.py ${BATCH_ID} pycytominer/cyto_utils/database_config/ingest_config.ini ${PLATE} \
--tmp-dir ~/ebs_tmp \
--aws-remote=s3://cellpainting-gallery/cpg0016-jump/source_5/workspace
I'll chat with @ashah03 about this and he will loop back
Done
Downloading CSVs from s3://cellpainting-gallery/cpg0016-jump/source_5/workspace/analysis/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206/analysis to ../../analysis/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206/analysis
Ingesting ../../analysis/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206/analysis
Indexing database /home/ec2-user/ebs_tmp/backend/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206/ATSJUM206.sqlite
Uploading /home/ec2-user/ebs_tmp/backend/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206/ATSJUM206.sqlite to s3://cellpainting-gallery/cpg0016-jump/source_5/workspace/backend/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206/ATSJUM206.sqlite
Removing analysis files from ../../analysis/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206/analysis and /home/ec2-user/ebs_tmp/backend/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206
Renaming /home/ec2-user/ebs_tmp/backend/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206/ATSJUM206.sqlite to ../../backend/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206/ATSJUM206.sqlite
Aggregating sqlite:///../../backend/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206/ATSJUM206.sqlite
Uploading ../../backend/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206/ATSJUM206.csv to s3://cellpainting-gallery/cpg0016-jump/source_5/workspace/backend/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206/ATSJUM206.csv
Removing backend files from ../../backend/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206
@ashah03 -- ATSJUM206.sqlite
is ready
aws s3 ls s3://cellpainting-gallery/cpg0016-jump/source_5/workspace/backend/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206/
2022-10-21 00:42:32 0
2024-03-14 05:17:09 54929099 ATSJUM206.csv
2024-03-14 04:22:44 44989165568 ATSJUM206.sqlite
Thanks a lot @ashah03!
@Arkkienkeli – all set here
aws s3 cp s3://staging-cellpainting-gallery/cpg0016-jump/source_5/workspace/load_data_csv/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206/load_data_with_illum_and_cell_location.parquet s3://cellpainting-gallery/cpg0016-jump/source_5/workspace/load_data_csv/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206/load_data_with_illum_and_cell_location.parquet
aws s3 ls s3://staging-cellpainting-gallery/cpg0016-jump/source_5/workspace/load_data_csv/JUMPCPE-20210730-Run14_20210731_000211/ATSJUM206/load_data_with_illum_and_cell_location.parquet
2024-03-14 19:22:48 14826148 load_data_with_illum_and_cell_location.parquet
@shntnu can we move this file to s3://cellpainting-gallery (not staging) since everything else is there?
@shntnu can we move this file to s3://cellpainting-gallery (not staging) since everything else is there?
Already done in https://github.com/jump-cellpainting/datasets/issues/102#issuecomment-1999453068
From @ashah03
We are unable to create the cell locations files for some plates because we run out of memory. We don't know whether this is to do with the SQLite file or the load_data CSV file
SQLite I generated for which cell locations does NOT generated
s3://staging-cellpainting-gallery/cpg0016-jump/source_3/workspace/backend/CP_25_all_Phenix1/C13443aW/C13443aW.sqlite
Corresponding load data Parquet file:
s3://cellpainting-gallery/cpg0016-jump/source_3/workspace/load_data_csv/CP_25_all_Phenix1/C13443aW/load_data_with_illum.parquet
SQLite for which cell locations does get generated
s3://cellpainting-gallery/cpg0016-jump/source_3/workspace/backend/CP60/BR5872b3/BR5872b3.sqlite
Corresponding load data Parquet file:
s3://cellpainting-gallery/cpg0016-jump/source_3/workspace/load_data_csv/CP60/BR5872b3/load_data_with_illum.parquet
@ashah03 to wrap this up, can you please do the following
We can return to this later so that you can keep moving for now (i.e. you are off the hook :D)
@Arkkienkeli - we seem to have hit a wall but there has to be a fix. I'll ask Cimini lab if they have any ideas on what could be wrong with the SQLite (if indeed it is the SQLite)
@shntnu moving to https://github.com/jump-cellpainting/datasets-private/issues/71 since this issue (source 5) is resolved
The cell locations file that is supposed to exist by url
does not exist.