Closed shntnu closed 1 year ago
Total number of wells
well %>% filter(Metadata_Source == "source_9") %>% count() %>% knitr::kable()
n |
---|
165888 |
Total number of plates
plate %>% filter(Metadata_Source == "source_9") %>% count() %>% knitr::kable()
n |
---|
108 |
Most frequent JCP IDs
well %>% filter(Metadata_Source == "source_9") %>% count(Metadata_JCP2022) %>% arrange(desc(n)) %>% head(n = 10) %>% knitr::kable()
Metadata_JCP2022 | n |
---|---|
JCP2022_033924 | 14976 |
JCP2022_UNKNOWN | 5694 |
JCP2022_025848 | 1656 |
JCP2022_046054 | 1656 |
JCP2022_050797 | 1656 |
JCP2022_012818 | 1620 |
JCP2022_037716 | 1620 |
JCP2022_064022 | 1620 |
JCP2022_085227 | 1620 |
JCP2022_033954 | 1584 |
Number of plates with JCP2022_UNKNOWN
well %>% filter(Metadata_Source == "source_9" & Metadata_JCP2022 == "JCP2022_UNKNOWN") %>% count(Metadata_Plate) %>% arrange(desc(n)) %>% count() %>% knitr::kable()
n |
---|
59 |
Counts of JCP2022_UNKNOWN
per plate
well %>% filter(Metadata_Source == "source_9" & Metadata_JCP2022 == "JCP2022_UNKNOWN") %>% count(Metadata_Plate) %>% arrange(desc(n)) %>% filter(n > 10) %>% knitr::kable()
Metadata_Plate | n |
---|---|
GR00004418 | 321 |
GR00004419 | 321 |
GR00004420 | 321 |
GR00004421 | 321 |
GR00004405 | 320 |
GR00004406 | 320 |
GR00004407 | 320 |
GR00004408 | 320 |
GR00003288 | 271 |
GR00003289 | 271 |
GR00003290 | 271 |
GR00004389 | 271 |
GR00003341 | 240 |
GR00003342 | 240 |
GR00003343 | 240 |
GR00003344 | 240 |
GR00004382 | 240 |
GR00004383 | 240 |
GR00004384 | 240 |
GR00004385 | 240 |
GR00004377 | 12 |
GR00004378 | 12 |
GR00004379 | 12 |
GR00004380 | 12 |
Only some of these had missing rows in the platemaps (those with >=240). The rest are missing because of unmapped SMILES
find platemaps -name "Platemap*_jcp.txt" -exec wc -l {} \;|grep -v 1537|tr -s " "|cut -d" " -f2,3|tr " " "/"|cut -d"/" -f1,5|tr "/" ","|sed s,Platemap_,,g|sed s,_jcp.txt,,g|sort -n
1217,GR00004405
1217,GR00004406
1217,GR00004407
1217,GR00004408
1217,GR00004418
1217,GR00004419
1217,GR00004420
1217,GR00004421
1266,GR00003288
1266,GR00003289
1266,GR00003290
1266,GR00004389
1297,GR00003341
1297,GR00003342
1297,GR00003343
1297,GR00003344
1297,GR00004382
1297,GR00004383
1297,GR00004384
1297,GR00004385
Now validated using https://github.com/jump-cellpainting/data-validation/tree/e4212f0cfcf0359e41b113ff4e755b8e98e32755
Number of plates with JCP2022_UNKNOWN
well %>% filter(Metadata_Source == "source_9" & Metadata_JCP2022 == "JCP2022_UNKNOWN") %>% count(Metadata_Plate) %>% arrange(desc(n)) %>% count() %>% knitr::kable()
n |
---|
43 |
Counts of JCP2022_UNKNOWN
per plate
well %>% filter(Metadata_Source == "source_9" & Metadata_JCP2022 == "JCP2022_UNKNOWN") %>% count(Metadata_Plate) %>% arrange(desc(n)) %>% filter(n > 10) %>% knitr::kable()
Metadata_Plate | n |
---|---|
GR00004377 | 12 |
GR00004378 | 12 |
GR00004379 | 12 |
GR00004380 | 12 |
Now, all of these are missing because of unmapped SMILES
The NaN wells are now marked as untreated
well %>% filter(Metadata_Source == "source_9" & Metadata_JCP2022 == "JCP2022_999999") %>% count(Metadata_Plate) %>% arrange(desc(n)) %>% knitr::kable()
Metadata_Plate | n |
---|---|
GR00004405 | 320 |
GR00004406 | 320 |
GR00004407 | 320 |
GR00004408 | 320 |
GR00004418 | 320 |
GR00004419 | 320 |
GR00004420 | 320 |
GR00004421 | 320 |
GR00003288 | 271 |
GR00003289 | 271 |
GR00003290 | 271 |
GR00004389 | 271 |
GR00003341 | 240 |
GR00003342 | 240 |
GR00003343 | 240 |
GR00003344 | 240 |
GR00004382 | 240 |
GR00004383 | 240 |
GR00004384 | 240 |
GR00004385 | 240 |
@dlogan do you happen to why JCP2022_033924
has so many replicates? Please see https://github.com/jump-cellpainting/datasets/issues/85#issue-2022252720. We didn't tag you there to maintain the anonymity of sources.
source_9
validated using https://github.com/jump-cellpainting/data-validation/commit/c843364a5360dd8487ba4429bd1d6cbb16f20d12