hci-unihd / antibodies-analysis-issues

Issue tracker for problems in the antibodies analysis workflow.
0 stars 0 forks source link

QC: Cell size #82

Open tischi opened 4 years ago

tischi commented 4 years ago

Median infected cell sizes per well distributions across plates

image

Cell size distributions for some plates

image

image

image

tischi commented 4 years ago

@metavibor @constantinpape OK, this looks very intriguing! So intriguing that I almost wonder whether there is something fishy going on somewhere... Why would we have so reproducibly almost the exact same bimodal distribution? But maybe this is just the outlier fraction due to dirt in the serum channel? In the end those are not many cells (beware the log scale of the y-axis).

metavibor commented 4 years ago

Do we have cell sizes of 100000 pixels? Could you look at those cells? I think the biggest I measured was an order of magnitude smaller. The same is on the other end

tischi commented 4 years ago

Could you look at those cells?

Would it help you if I try to find out the image name? Then you could look with the plate viewer.

Here are random locations on one plate that should contain cells that are

larger than 100000

 [1] "plate2rep3_20200507_094942_519 C05-0002"
 [2] "plate2rep3_20200507_094942_519 H10-0004"
 [3] "plate2rep3_20200507_094942_519 D06-0005"
 [4] "plate2rep3_20200507_094942_519 H01-0006"
 [5] "plate2rep3_20200507_094942_519 H09-0004"
 [6] "plate2rep3_20200507_094942_519 B01-0000"
 [7] "plate2rep3_20200507_094942_519 H09-0008"
 [8] "plate2rep3_20200507_094942_519 B12-0000"
 [9] "plate2rep3_20200507_094942_519 H01-0004"
[10] "plate2rep3_20200507_094942_519 E09-0000"

smaller than 100

 [1] "plate2rep3_20200507_094942_519 G04-0000"
 [2] "plate2rep3_20200507_094942_519 G04-0003"
 [3] "plate2rep3_20200507_094942_519 C10-0000"
 [4] "plate2rep3_20200507_094942_519 F09-0005"
 [5] "plate2rep3_20200507_094942_519 H05-0007"
 [6] "plate2rep3_20200507_094942_519 B09-0006"
 [7] "plate2rep3_20200507_094942_519 G03-0005"
 [8] "plate2rep3_20200507_094942_519 G09-0000"
 [9] "plate2rep3_20200507_094942_519 D04-0005"
[10] "plate2rep3_20200507_094942_519 G09-0000"
constantinpape commented 4 years ago

@tischi thanks for checking this. As we have discussed, I will export 'to small', 'to large' masks now for these plates so we can inspect that visually. No need for image names, once we have masks in the PlateViewer, this should be fast to see.

constantinpape commented 4 years ago

@metavibor @constantinpape OK, this looks very intriguing! So intriguing that I almost wonder whether there is something fishy going on somewhere... Why would we have so reproducibly almost the exact same bimodal distribution? But maybe this is just the outlier fraction due to dirt in the serum channel? In the end those are not many cells (beware the log scale of the y-axis).

Ok, very interesting, we need to check up on this .... I will let you know as soon as I exported the masks.

One thing to keep in mind is that @imagirom's code to compute these sizes is a bit non-standard. I have checked for a few examples that it works, but maybe there are corner cases where it fails.

tischi commented 4 years ago

yes, but still then @metavibor would know in which wells to find some :-)

Regarding the initial thresholds for this, based on looking at the distributions I would say

100 and 25000

...would be sensible?

I will later try to fit something to the distributions (maybe 4 gaussians) to see what that gives...

constantinpape commented 4 years ago

100 and 25000

...would be sensible?

25.000 is huge.... We should really check how these cells look like.

tischi commented 4 years ago

Ok, then let's wait until @metavibor looked at some? (before we re-run...)

constantinpape commented 4 years ago

I wrote cell_size_mask now to all plates in /g/kreshuk/data/covid/data-processed. This image has 2 colors, one for small and one for small (< 100 pix) and one for large (> 25000 pix) segments. We can check it out after the meeting.

constantinpape commented 4 years ago

Fyi, I double checked the size calculation, and it's correct.

constantinpape commented 4 years ago

I have computed the size masks now. I will double check that it worked now.

constantinpape commented 4 years ago

I checked this now and this makes total sense: all the large cells are segmentation errors caused by some image artifact. This is usually a local very bright spot in one of the channels; unfortunately this currently ruins segmentation for the whole image, because the network image normalizations are not robust to this. Eventually, we can fix this by using a more robust normalization procedure. For now, let's take maybe 15000 as size threshold and just kick these out.

Here are some examples: Screenshot from 2020-05-15 15-52-04 Screenshot from 2020-05-15 15-53-29 Screenshot from 2020-05-15 15-54-25

metavibor commented 4 years ago

I'm looking at one of the sites (H10-004) proposed by @tischi where there is supposed to be a large cell but I don't see anything abnormal large cell

constantinpape commented 4 years ago

I'm looking at one of the sites (H10-004) proposed by @tischi where there is supposed to be a large cell but I don't see anything abnormal

Yes that looks normal. Maybe the site is wrong. Anyway, I am sure that this is caused by the segmentation errors due to imaging artifacts.

metavibor commented 4 years ago

@constantinpape you said you included another "layer" that can be looked via PlateViewer, what is it? is it "cell_size_mask"? what is that supposed to show? nothing happens when I enable that in this image

constantinpape commented 4 years ago

is it "cell_size_mask"?

yes exactly that's it.

what is that supposed to show? nothing happens when I enable that in this image

It shows a mask for the cells that are larger than 25.000 pixels or smaller than a 100. If you don't see anything, then there is no cell larger than this in the image. A good way to use this is to zoom out a lot and just look for wells where you can see the mask:

E.g on plate 311:

Zoomed out: Screenshot from 2020-05-15 16-39-03

Zoomed in on Well D08, which has artfiacts in the serum channel that screw up the segmentation: Screenshot from 2020-05-15 16-40-11

metavibor commented 4 years ago

this is exactly what I did and found nothing on the plate 519 reported by @tischi ... I looked at 311 in D06 and realized these are all images that are flagged in quality control by Severina. The question is why are they showing up in the cell statistics, why any computation is done on these?

constantinpape commented 4 years ago

this is exactly what I did and found nothing on the plate 519 reported by @tischi ...

I see. Maybe there is indeed an issue with Tischi's histograms.

The question is why are they showing up in the cell statistics, why any computation is done on these?

We still compute all the statistics even for the images that were marked as outliers. Then, we don't take the outliers into account when computing the scores later.

(The reason for this is that we need to combine the manual and automatically detected outliers at some point; and we need the stats for the automatic checks, so it's easier to calculate all statistics first and then filter for outliers later.)

tischi commented 4 years ago

I'm looking at one of the sites (H10-004) proposed by @tischi where there is supposed to be a large cell but I don't see anything abnormal

This is strange indeed. Not sure. I can check the R code again...

tischi commented 4 years ago

@metavibor

Could you look in those wells? [EDIT: don't do it, see below]

 [1] "plate2rep3_20200507_094942_519 E01-0000" "plate2rep3_20200507_094942_519 F12-0000"
 [3] "plate2rep3_20200507_094942_519 A06-0001" "plate2rep3_20200507_094942_519 D07-0006"
 [5] "plate2rep3_20200507_094942_519 C01-0008" "plate2rep3_20200507_094942_519 H05-0000"
 [7] "plate2rep3_20200507_094942_519 D05-0006" "plate2rep3_20200507_094942_519 A06-0000"
 [9] "plate2rep3_20200507_094942_519 A06-0001" "plate2rep3_20200507_094942_519 B01-0006"
tischi commented 4 years ago

@constantinpape @imagirom @metavibor ....you guys are storing the background as a cell with label_id = 0, right?! 🍭

tischi commented 4 years ago

That was it: image

constantinpape commented 4 years ago

@constantinpape @imagirom @metavibor ....you guys are storing the background as a cell with label_id =0, right?! lollipop

yes indeed

tischi commented 4 years ago

yes indeed

Those were our mysterious large cells.

metavibor commented 4 years ago

ok cool :) shall we say size limit 100-15000

constantinpape commented 4 years ago

ok cool :) shall we say size limit 100-15000

Will do. I am also estimating the values for the nuclei from the data now, will post it later here as well.