jump-cellpainting / datasets

Images and other data from the JUMP Cell Painting Consortium
BSD 3-Clause "New" or "Revised" License
155 stars 16 forks source link

Add source_9 #30

Closed shntnu closed 1 year ago

shntnu commented 1 year ago
shntnu commented 1 year ago

Total number of wells

well %>% filter(Metadata_Source == "source_9") %>% count() %>% knitr::kable()
n
165888

Total number of plates

plate %>% filter(Metadata_Source == "source_9") %>% count() %>% knitr::kable()
n
108

Most frequent JCP IDs

well %>% filter(Metadata_Source == "source_9") %>% count(Metadata_JCP2022) %>% arrange(desc(n)) %>% head(n = 10) %>% knitr::kable()
Metadata_JCP2022 n
JCP2022_033924 14976
JCP2022_UNKNOWN 5694
JCP2022_025848 1656
JCP2022_046054 1656
JCP2022_050797 1656
JCP2022_012818 1620
JCP2022_037716 1620
JCP2022_064022 1620
JCP2022_085227 1620
JCP2022_033954 1584
shntnu commented 1 year ago

Number of plates with JCP2022_UNKNOWN

 well %>% filter(Metadata_Source == "source_9" & Metadata_JCP2022 == "JCP2022_UNKNOWN") %>% count(Metadata_Plate) %>% arrange(desc(n)) %>% count() %>% knitr::kable()
n
59

Counts of JCP2022_UNKNOWN per plate

well %>% filter(Metadata_Source == "source_9" & Metadata_JCP2022 == "JCP2022_UNKNOWN") %>% count(Metadata_Plate) %>% arrange(desc(n)) %>% filter(n > 10) %>% knitr::kable()
Metadata_Plate n
GR00004418 321
GR00004419 321
GR00004420 321
GR00004421 321
GR00004405 320
GR00004406 320
GR00004407 320
GR00004408 320
GR00003288 271
GR00003289 271
GR00003290 271
GR00004389 271
GR00003341 240
GR00003342 240
GR00003343 240
GR00003344 240
GR00004382 240
GR00004383 240
GR00004384 240
GR00004385 240
GR00004377 12
GR00004378 12
GR00004379 12
GR00004380 12

Only some of these had missing rows in the platemaps (those with >=240). The rest are missing because of unmapped SMILES

 find platemaps -name "Platemap*_jcp.txt" -exec wc -l {} \;|grep -v 1537|tr -s " "|cut -d" " -f2,3|tr " " "/"|cut -d"/" -f1,5|tr "/" ","|sed s,Platemap_,,g|sed s,_jcp.txt,,g|sort -n
1217,GR00004405
1217,GR00004406
1217,GR00004407
1217,GR00004408
1217,GR00004418
1217,GR00004419
1217,GR00004420
1217,GR00004421
1266,GR00003288
1266,GR00003289
1266,GR00003290
1266,GR00004389
1297,GR00003341
1297,GR00003342
1297,GR00003343
1297,GR00003344
1297,GR00004382
1297,GR00004383
1297,GR00004384
1297,GR00004385
shntnu commented 1 year ago

Now validated using https://github.com/jump-cellpainting/data-validation/tree/e4212f0cfcf0359e41b113ff4e755b8e98e32755

Number of plates with JCP2022_UNKNOWN

 well %>% filter(Metadata_Source == "source_9" & Metadata_JCP2022 == "JCP2022_UNKNOWN") %>% count(Metadata_Plate) %>% arrange(desc(n)) %>% count() %>% knitr::kable()
n
43

Counts of JCP2022_UNKNOWN per plate

well %>% filter(Metadata_Source == "source_9" & Metadata_JCP2022 == "JCP2022_UNKNOWN") %>% count(Metadata_Plate) %>% arrange(desc(n)) %>% filter(n > 10) %>% knitr::kable()
Metadata_Plate n
GR00004377 12
GR00004378 12
GR00004379 12
GR00004380 12

Now, all of these are missing because of unmapped SMILES

The NaN wells are now marked as untreated

well %>% filter(Metadata_Source == "source_9" & Metadata_JCP2022 == "JCP2022_999999") %>% count(Metadata_Plate) %>% arrange(desc(n)) %>% knitr::kable()
Metadata_Plate n
GR00004405 320
GR00004406 320
GR00004407 320
GR00004408 320
GR00004418 320
GR00004419 320
GR00004420 320
GR00004421 320
GR00003288 271
GR00003289 271
GR00003290 271
GR00004389 271
GR00003341 240
GR00003342 240
GR00003343 240
GR00003344 240
GR00004382 240
GR00004383 240
GR00004384 240
GR00004385 240
shntnu commented 9 months ago

@dlogan do you happen to why JCP2022_033924 has so many replicates? Please see https://github.com/jump-cellpainting/datasets/issues/85#issue-2022252720. We didn't tag you there to maintain the anonymity of sources.