Open ChenyuWang-Monica opened 7 months ago
Hi @ChenyuWang-Monica, my answers are below
The top ten compounds have >6000 replicates. Among them are DMSO, the empty well (JCP2022_999999), and 8 positive controls. However, when I compare the InChIKey of the 8 positive controls with those given in https://github.com/jump-cellpainting/JUMP-Target/tree/master#positive-control-compounds, one of them disagrees: JCP2022_025848 (GJFCONYVAUNLKB-UHFFFAOYSA-N) has 8127 replicates but is not listed as a positive control; dexamethasone (UREBDLICKHMUKA-CXSFZGCWSA-N) listed as a positive control doesn't appear in the metadata compound.csv.gz.
We have been having some issues with matching InChIKeys between what we previously released in the JUMP-Target repo and what we released in this repo. But I can confirm that JCP2022_025848 is dexamethasone. The mapping between JCP2022 IDs and compound names are below.
Metadata_JCP2022 | Metadata_InChIKey | poscon_pert_iname | JUMP_Target_InChIKey |
---|---|---|---|
JCP2022_085227 | SRVFFFJZQVENJC-UHFFFAOYSA-N | aloxistatin | SRVFFFJZQVENJC-IHRRRGAJSA-N |
JCP2022_037716 | IVUGFMLRJOCGAS-UHFFFAOYSA-N | AMG900 | IVUGFMLRJOCGAS-UHFFFAOYSA-N |
JCP2022_025848 | GJFCONYVAUNLKB-UHFFFAOYSA-N | dexamethasone | UREBDLICKHMUKA-CXSFZGCWSA-N |
JCP2022_046054 | KPBNHDGDUADAGP-UHFFFAOYSA-N | FK-866 | KPBNHDGDUADAGP-VAWYXSNFSA-N |
JCP2022_035095 | IHLVSLOZUHKNMQ-UHFFFAOYSA-N | LY2109761 | IHLVSLOZUHKNMQ-UHFFFAOYSA-N |
JCP2022_064022 | OINGHOPGNMYCAB-UHFFFAOYSA-N | NVS-PAK1-1 | OINGHOPGNMYCAB-INIZCTEOSA-N |
JCP2022_050797 | LOUPRKONTZGTKE-UHFFFAOYSA-N | quinidine | LOUPRKONTZGTKE-LHHVKLHASA-N |
JCP2022_012818 | CQKBSRPVZZLCJE-UHFFFAOYSA-N | TC-S-7004 | CQKBSRPVZZLCJE-UHFFFAOYSA-N |
The 11th-ranked compound JCP2022_033954 has 1594 replicates. Is it also a positive control or what is it aiming for?
Thanks for bringing this to our attention. I believe this is a metadata issue. Most of these wells come from a single source (source_9) and all the wells are in columns 1, 24, 25 or 48. @shntnu you had noticed the number of replicates in https://github.com/jump-cellpainting/datasets/pull/30#issuecomment-1376614661, but I don't know whether we flagged this as a metadata error or not.
There are many compounds with multiple replicates (for example over 10 but less than 60). Why do they have much more replicates than the common case as mentioned in the paper (i.e. about 5)?
In general, most compounds should have five replicates, but there are some exceptions and I have listed some of them below.
Thanks for bringing this to our attention. I believe this is a metadata issue. Most of these wells come from a single source (source_9) and all the wells are in columns 1, 24, 25 or 48. @shntnu you had noticed the number of replicates in #30 (comment), but I don't know whether we flagged this as a metadata error or not.
Indeed – not sure why this was the case. I'll follow up in that internal issue and loop back here
When I'm counting the replicates of each compound in the COMPOUND plates, I have a few questions:
The top ten compounds have >6000 replicates. Among them are DMSO, the empty well (JCP2022_999999), and 8 positive controls. However, when I compare the InChIKey of the 8 positive controls with those given in https://github.com/jump-cellpainting/JUMP-Target/tree/master#positive-control-compounds, one of them disagrees: JCP2022_025848 (GJFCONYVAUNLKB-UHFFFAOYSA-N) has 8127 replicates but is not listed as a positive control; dexamethasone (UREBDLICKHMUKA-CXSFZGCWSA-N) listed as a positive control doesn't appear in the metadata compound.csv.gz.
The 11th-ranked compound JCP2022_033954 has 1594 replicates. Is it also a positive control or what is it aiming for?
There are many compounds with multiple replicates (for example over 10 but less than 60). Why do they have much more replicates than the common case as mentioned in the paper (i.e. about 5)?
Thanks!