Open tischi opened 4 years ago
@metavibor @constantinpape OK, this looks very intriguing! So intriguing that I almost wonder whether there is something fishy going on somewhere... Why would we have so reproducibly almost the exact same bimodal distribution? But maybe this is just the outlier fraction due to dirt in the serum channel? In the end those are not many cells (beware the log scale of the y-axis).
Do we have cell sizes of 100000 pixels? Could you look at those cells? I think the biggest I measured was an order of magnitude smaller. The same is on the other end
Could you look at those cells?
Would it help you if I try to find out the image name? Then you could look with the plate viewer.
Here are random locations on one plate that should contain cells that are
[1] "plate2rep3_20200507_094942_519 C05-0002"
[2] "plate2rep3_20200507_094942_519 H10-0004"
[3] "plate2rep3_20200507_094942_519 D06-0005"
[4] "plate2rep3_20200507_094942_519 H01-0006"
[5] "plate2rep3_20200507_094942_519 H09-0004"
[6] "plate2rep3_20200507_094942_519 B01-0000"
[7] "plate2rep3_20200507_094942_519 H09-0008"
[8] "plate2rep3_20200507_094942_519 B12-0000"
[9] "plate2rep3_20200507_094942_519 H01-0004"
[10] "plate2rep3_20200507_094942_519 E09-0000"
[1] "plate2rep3_20200507_094942_519 G04-0000"
[2] "plate2rep3_20200507_094942_519 G04-0003"
[3] "plate2rep3_20200507_094942_519 C10-0000"
[4] "plate2rep3_20200507_094942_519 F09-0005"
[5] "plate2rep3_20200507_094942_519 H05-0007"
[6] "plate2rep3_20200507_094942_519 B09-0006"
[7] "plate2rep3_20200507_094942_519 G03-0005"
[8] "plate2rep3_20200507_094942_519 G09-0000"
[9] "plate2rep3_20200507_094942_519 D04-0005"
[10] "plate2rep3_20200507_094942_519 G09-0000"
@tischi thanks for checking this. As we have discussed, I will export 'to small', 'to large' masks now for these plates so we can inspect that visually. No need for image names, once we have masks in the PlateViewer, this should be fast to see.
@metavibor @constantinpape OK, this looks very intriguing! So intriguing that I almost wonder whether there is something fishy going on somewhere... Why would we have so reproducibly almost the exact same bimodal distribution? But maybe this is just the outlier fraction due to dirt in the serum channel? In the end those are not many cells (beware the log scale of the y-axis).
Ok, very interesting, we need to check up on this .... I will let you know as soon as I exported the masks.
One thing to keep in mind is that @imagirom's code to compute these sizes is a bit non-standard. I have checked for a few examples that it works, but maybe there are corner cases where it fails.
yes, but still then @metavibor would know in which wells to find some :-)
Regarding the initial thresholds for this, based on looking at the distributions I would say
100
and 25000
...would be sensible?
I will later try to fit something to the distributions (maybe 4 gaussians) to see what that gives...
100
and25000
...would be sensible?
25.000 is huge.... We should really check how these cells look like.
Ok, then let's wait until @metavibor looked at some? (before we re-run...)
I wrote cell_size_mask
now to all plates in /g/kreshuk/data/covid/data-processed
.
This image has 2 colors, one for small and one for small (< 100 pix) and one for large (> 25000 pix) segments.
We can check it out after the meeting.
Fyi, I double checked the size calculation, and it's correct.
I have computed the size masks now. I will double check that it worked now.
I checked this now and this makes total sense: all the large cells are segmentation errors caused by some image artifact. This is usually a local very bright spot in one of the channels; unfortunately this currently ruins segmentation for the whole image, because the network image normalizations are not robust to this. Eventually, we can fix this by using a more robust normalization procedure. For now, let's take maybe 15000 as size threshold and just kick these out.
Here are some examples:
I'm looking at one of the sites (H10-004) proposed by @tischi where there is supposed to be a large cell but I don't see anything abnormal
I'm looking at one of the sites (H10-004) proposed by @tischi where there is supposed to be a large cell but I don't see anything abnormal
Yes that looks normal. Maybe the site is wrong. Anyway, I am sure that this is caused by the segmentation errors due to imaging artifacts.
@constantinpape you said you included another "layer" that can be looked via PlateViewer, what is it? is it "cell_size_mask"? what is that supposed to show? nothing happens when I enable that in this image
is it "cell_size_mask"?
yes exactly that's it.
what is that supposed to show? nothing happens when I enable that in this image
It shows a mask for the cells that are larger than 25.000 pixels or smaller than a 100. If you don't see anything, then there is no cell larger than this in the image. A good way to use this is to zoom out a lot and just look for wells where you can see the mask:
E.g on plate 311:
Zoomed out:
Zoomed in on Well D08, which has artfiacts in the serum channel that screw up the segmentation:
this is exactly what I did and found nothing on the plate 519 reported by @tischi ... I looked at 311 in D06 and realized these are all images that are flagged in quality control by Severina. The question is why are they showing up in the cell statistics, why any computation is done on these?
this is exactly what I did and found nothing on the plate 519 reported by @tischi ...
I see. Maybe there is indeed an issue with Tischi's histograms.
The question is why are they showing up in the cell statistics, why any computation is done on these?
We still compute all the statistics even for the images that were marked as outliers. Then, we don't take the outliers into account when computing the scores later.
(The reason for this is that we need to combine the manual and automatically detected outliers at some point; and we need the stats for the automatic checks, so it's easier to calculate all statistics first and then filter for outliers later.)
I'm looking at one of the sites (H10-004) proposed by @tischi where there is supposed to be a large cell but I don't see anything abnormal
This is strange indeed. Not sure. I can check the R code again...
@metavibor
Could you look in those wells? [EDIT: don't do it, see below]
[1] "plate2rep3_20200507_094942_519 E01-0000" "plate2rep3_20200507_094942_519 F12-0000"
[3] "plate2rep3_20200507_094942_519 A06-0001" "plate2rep3_20200507_094942_519 D07-0006"
[5] "plate2rep3_20200507_094942_519 C01-0008" "plate2rep3_20200507_094942_519 H05-0000"
[7] "plate2rep3_20200507_094942_519 D05-0006" "plate2rep3_20200507_094942_519 A06-0000"
[9] "plate2rep3_20200507_094942_519 A06-0001" "plate2rep3_20200507_094942_519 B01-0006"
@constantinpape @imagirom @metavibor
....you guys are storing the background as a cell with label_id = 0
, right?! 🍭
That was it:
@constantinpape @imagirom @metavibor ....you guys are storing the background as a cell with
label_id =0
, right?! lollipop
yes indeed
yes indeed
Those were our mysterious large cells.
ok cool :) shall we say size limit 100-15000
ok cool :) shall we say size limit 100-15000
Will do. I am also estimating the values for the nuclei from the data now, will post it later here as well.
Median infected cell sizes per well distributions across plates
Cell size distributions for some plates