jump-cellpainting / datasets

Images and other data from the JUMP Cell Painting Consortium
BSD 3-Clause "New" or "Revised" License
149 stars 13 forks source link

Are pixel sizes the same across all sources? #95

Open shntnu opened 5 months ago

shntnu commented 5 months ago

CW asked:

I observed that the images from different sources have slightly different image sizes. For example, source 1: 10801080; source 2: 996996; source 3: 10801080; source 6: 970970; source 8: 10241024; source 10: 10001000. Do they have the same pixel size, or they are also scaled in proportion?

Arkkienkeli commented 2 months ago

Hi @shntnu, was this information published anywhere yet? Thank you!

Arkkienkeli commented 2 months ago

I looked into metadata, for sources 1,3,4,11 there is Index.idx.xml, tags ImageResolutionX, ImageResolutionY for source 9 indexfile.txt table file ImageSizeX, ImageSizeY columns for sources 2,5,6,10,13 MeasurementDetail.mrf, tags HorizontalPixelDimension, VerticallPixelDimension.

Unfortunately I did not find metadata files for source 7 and 8.

(source 7 and 8 updated by @shntnu)

source μm
source_1 0.597976 μm
source_2 0.65 μm
source_3 0.597976 μm
source_4 0.597976 μm
source_5 0.64974 μm
source_6 0.646187 μm
source_7 0.65 μm
source_8 0.7032 μm
source_9 0.597976 μm
source_10 0.6516944 μm
source_11 0.597976 μm
source_13 0.65 μm
shntnu commented 2 months ago

Thanks for documenting that @Arkkienkeli! I have updated your table

I'll leave this issue so we one of us comment on what the implications are (briefly – one of the reasons normalization is important)

``` aws s3 cp s3://cellpainting-gallery/cpg0016-jump/source_8/images/J1/images/A1170383/Images/HTS_A01_s1_w20B4AE6FD-1ADC-4BA5-AB69-A459DD5C1532.tif ~/Desktop/ && tiffinfo ~/Desktop/HTS_A01_s1_w20B4AE6FD-1ADC-4BA5-AB69-A459DD5C1532.tif download: s3://cellpainting-gallery/cpg0016-jump/source_8/images/J1/images/A1170383/Images/HTS_A01_s1_w20B4AE6FD-1ADC-4BA5-AB69-A459DD5C1532.tif to Desktop/HTS_A01_s1_w20B4AE6FD-1ADC-4BA5-AB69-A459DD5C1532.tif === TIFF directory 0 === TIFF Directory at offset 0x200008 (2097160) Subfile Type: multi-page document (2 = 0x2) Image Width: 1024 Image Length: 1024 Bits/Sample: 16 Compression Scheme: None Photometric Interpretation: min-is-black Orientation: row 0 top, col 0 lhs Samples/Pixel: 1 Rows/Strip: 4 Planar Configuration: single image plane ImageDescription: Software: MetaSeries DateTime: 20210317 00:49:35.868 ```
ChenyuWang-Monica commented 2 months ago

To double-check my understanding, this means that we need to resize the images from different sources according to the table?

bethac07 commented 2 months ago

@ChenyuWang-Monica nope, definitely not! Will expand below.

bethac07 commented 2 months ago

@shntnu and @Arkkienkeli , thanks for digging in on this! Can we add this to the metadata CSV?

For normalized measurements, this will have almost-no impact, since most things are in pixel units, and fundamentally, once we're normalized, they're in "mean cell units". A ~16% change in the pixel size (largest/smallest ~= 1.16) will maybe change some precision, but not much else.

For non-normalized measurements, realistically we shouldn't be comparing across sites anyway, but even if we are, for things like area, it would just be applying the pixel conversion factor before comparing. The only place where things get a littttle dicey is stuff around textures and granularity, but we're typically looking at scales around 3/5/10 pixels, so a 1.16 multiplier still only changes things one or two pixels, give or take (3->3.48, 5->5.8, 10->11.6). Realistically, I'd expect other experimental details for those metrics, since they're intensity dependent, would dominate over the pixel size discontinuity.

ChenyuWang-Monica commented 2 months ago

@bethac07 Thanks for the explanation! I have two follow-up questions:

  1. For non-normalized measurements like area, is the adjustment already applied to the well-level CellProfiler features in workspace/profiles? If not, is there a file describing whether each feature is normalized or non-normalized?

  2. If I want to use the raw images instead of the features, how should I adjust for such resolution differences? Would resizing the images according to the resolution work?

bethac07 commented 2 months ago

All measurements start off non-normalized and then get aggregated, then normalized, then feature selected. It should be pretty clear in the file name which is which. All measurements are in pixel units, nothing we report has a pixel-to-micron conversion applied.

The interpolations involved in resizing images I suspect will be more destructive than the pixel size difference, so I would personally suggest you not do that - again, it's a relatively small difference (16%). I wouldn't try to correct for it but more just note that it might be the source of small pixel size discrepencies when looking across sources.