jump-cellpainting / datasets

Images and other data from the JUMP Cell Painting Consortium
BSD 3-Clause "New" or "Revised" License
149 stars 13 forks source link

Spaces in the plates` folder names in two batches of source_3 #97

Closed Arkkienkeli closed 3 months ago

Arkkienkeli commented 5 months ago

Hello, there is some inconsistency in the folder naming of plates in source_3 of JUMP dataset, batches CP59 and CP60, particularly a space between Measurement and the digit. That is not the case for other plates in source_3 and source_4, which has a similar naming format of plates.

Fixing that would improve the experience of automatic processing of the dataset.

Example (output truncated):

$ aws s3 ls s3://cellpainting-gallery/cpg0016-jump/source_3/images/CP60/images/ --no-sign-request
                           PRE BR5872a3__2022-04-30T12_46_56-Measurement 1/
                           PRE BR5872b3__2022-04-30T13_39_22-Measurement 1/
                           PRE BR5872c3__2022-04-30T11_01_30-Measurement 1/
                           PRE BR5872d3__2022-04-30T02_03_20-Measurement 1/
                           PRE BR5873a3__2022-04-30T04_02_26-Measurement 1/
                           PRE BR5873b3__2022-04-30T10_09_45-Measurement 1/
                           PRE BR5873c3__2022-04-29T15_35_43-Measurement 1/
                           PRE BR5873d3W__2022-05-17T04_43_01-Measurement 2/
shntnu commented 3 months ago

Ugh – that is painful indeed. Note that we would need to recreate the load_data files as well. Please report back if you find this issue in more plates and we can decide whether to fix these or just them be.

aws s3 cp s3://cellpainting-gallery/cpg0016-jump/source_3/workspace/load_data_csv/CP60/BR5872a3/load_data.csv.gz - |gunzip |head -4|csvcut -c PathName_OrigAGP
PathName_OrigAGP
s3://cellpainting-gallery/cpg0016-jump/source_3/images/CP60/images/BR5872a3__2022-04-30T12_46_56-Measurement 1/Images/
s3://cellpainting-gallery/cpg0016-jump/source_3/images/CP60/images/BR5872a3__2022-04-30T12_46_56-Measurement 1/Images/
s3://cellpainting-gallery/cpg0016-jump/source_3/images/CP60/images/BR5872a3__2022-04-30T12_46_56-Measurement 1/Images/
shntnu commented 3 months ago

Please report back if you find this issue in more plates and we can decide whether to fix these or just them be.

I am closing this out now. Please reopen if you see more of this