jump-cellpainting / datasets

Images and other data from the JUMP Cell Painting Consortium
BSD 3-Clause "New" or "Revised" License
149 stars 13 forks source link

Positive controls identifiers #77

Closed FrenkT closed 7 months ago

FrenkT commented 11 months ago

Hi all,

I spent some time locating in cpg0016 the positive controls documented here , and I think I found a few issues that might be worth flagging here.

First of all, out of the 8 InChI keys documented in Target-2 as positive controls, only 3 of them seem to be present in compound.csv.gz. Specifically: AMG900, LY2109761, TC-S-7004. The main reason for the mismatch seems to be that that the compound InChIs and InChIKeys provided in compound.csv.gz do not contain stereochemical information, so matching with only the first layer of the InChIKey solves the problem for 4 more compounds, specifically: NVS-PAK1-1, FK-866, quinidine, aloxistatin.

However, dexamethasone can't be resolved with simple matching. So I looked into the metadata for TARGET-2 plates at wells H24 and K02 (where dexamethasone is supposed to be, as per Target-2 platemap and metadata), and found the following InChIKey in most plates: GJFCONYVAUNLKB-UHFFFAOYSA-N. Only some plates from sources 7 and 9 seem to contain other compounds.

So, here are a few questions:

  1. Would it be worth harmonizing the Target-2 documentation to clarify the InChIKeys that are actually used in the main JUMP dataset?
  2. Could someone clarify the mismatch on dexamethasone? GJFCONYVAUNLKB-UHFFFAOYSA-N points to a compound (pubchem link) that is not really dexamethasone (UREBDLICKHMUKA-CXSFZGCWSA-N, Tanimoto coefficient ~=0.9), so I am wondering if this is an error in the metadata or if this was intentionally a different compound.
  3. It seems like some TARGET-2 plates from sources 7 and 9 don't follow the expected Target-2 layout. Is this a known issue?

Many thanks.

niranjchandrasekaran commented 10 months ago

Hi @FrenkT, thank you for bringing these to our attention.

Would it be worth harmonizing the Target-2 documentation to clarify the InChIKeys that are actually used in the main JUMP dataset?

This is a good suggestion. We will work on making it easier to map Target-2 compounds in the JUMP-Target repo to the ones in the JUMP dataset.

Could someone clarify the mismatch on dexamethasone? GJFCONYVAUNLKB-UHFFFAOYSA-N points to a compound (pubchem link) that is not really dexamethasone (UREBDLICKHMUKA-CXSFZGCWSA-N, Tanimoto coefficient ~=0.9), so I am wondering if this is an error in the metadata or if this was intentionally a different compound.

I don't have an explanation for this mismatch. I can confirm that the compound should be dexamethasone. We will look into the source of this discrepancy.

It seems like some TARGET-2 plates from sources 7 and 9 don't follow the expected Target-2 layout. Is this a known issue?

Can you let us know how the layouts differ? Do you find the entire layout to be different, or if some compounds are in the wrong wells?

Thanks, Niranj

FrenkT commented 10 months ago

Hi @niranjchandrasekaran ,

Thanks for your replies.

Regarding your questions, I looked a bit deeper and I discover that:

  1. For what concerns source 9, I think the problem was on my side, as I just realised that the plates have 1536 wells. So the well position IDs are different with respect to the position IDs provided with the Target-2 plate map (which is what I was using to query the plates). There are also 4 replicates of each compound, so it seems like it follows the suggestion mentioned in the Target-2 documentation "For a 1536-well plate, the layout is similar, but in four quadrants".
  2. For what concerns source 7, it seems like differences are limited to plate CP1-SC2-25, which is a 384 well plate. I looked a bit better into the plate and I think that the layout in this case is mirrored (e.g. A01 is actually in P23, H12 in I13, etc.), so that's why I couldn't get the expected compounds using the original plate map. I am not sure if this should be considered a problem or not.

So overall I think that source 7 and 9 are fine (sorry for the false alarm 🙂 ), it's just that some plates can't be queried using directly the Target-2 plate map.

niranjchandrasekaran commented 10 months ago

Thank you for looking into this. I am relieved to learn that the source 9 error can be explained by the 1536 well plate.

Regarding source 7, the plate rotation is unexpected, and it should be flagged as an issue. @johnarevalo do you find CP1-SC2-25 to be an outlier in your experiments? If so, we should define this rotated Target-2 as a new plate layout (Target-3 perhaps). If not, then we should fix the metadata in this repo.

shntnu commented 7 months ago

I'll close this out in favor of #80

I will create a new issue for this (thanks for all the sleuthing @FrenkT!)

2. For what concerns source 7, it seems like differences are limited to plate CP1-SC2-25, which is a 384 well plate. I looked a bit better into the plate and I think that the layout in this case is mirrored (e.g. A01 is actually in P23, H12 in I13, etc.), so that's why I couldn't get the expected compounds using the original plate map. I am not sure if this should be considered a problem or not.