Open hanslovsky opened 11 months ago
Note, I stated originally that I found 1216 wells with at least one missing image, but this is incorrect. I found 1216 fields/sites with at least one missing image.
Thank you @hanslovsky for the detailed report!
These images in source_11
are indeed missing (internal notes: https://github.com/jump-cellpainting/aws/issues/81#issuecomment-1266405250). I will keep this issue open so that we can think of ways to inform the users of the dataset that these files are missing.
I am trying to download all images for source_11 that I can find in the respective
load_data_with_illum.parquet
files. I found that for these parquet files,there are 1216 fields/sites with at least one missing image, for a total of 6068 missing images that I attached as CSV in source_11-404.txt (I had to change the extension from txt to csv to attach in this comment). This is what the CSV looks like:
For example,
aws s3 ls
on the first file returns in above snippet exits with code1
, i.e. the key does not exist:When I use the same key but change the channel from
ch2
toch1
, that file exists:I will double-check that I inferred the correct file names from the
parquet
files. The existence ofch1
in this example suggests that I inferred the correct names, at least for that field/site.To find the number of missing fields/sites, I removed the channel sub-string:
Subtract 1 for the CSV header.