jump-cellpainting / datasets

Images and other data from the JUMP Cell Painting Consortium
BSD 3-Clause "New" or "Revised" License
149 stars 13 forks source link

What are the actual controls for the CRISPR plates? #96

Open timtreis opened 5 months ago

timtreis commented 5 months ago

Hey everyone, we're trying to compare data across "perturbation-universes". For this, we're having a deeper look into the controls in the CRISPR plates.

Their layout is, for example, like this: image

Chandrasekaran et al. states that:

image

By deduction, that'd mean that no-guide is DMSO. However, DMSO would be a bad control for CRISPR, since the guide RNAs, unlike drugs, are not dissolved in it but rather some transfection agent. Were these acquired to stay comparable to the other drug plates?

@yugeji

yugeji commented 5 months ago

Also from the paper:

image
gewirtz commented 3 months ago

So to be very explicit-- are the no-guide and non-target (EXCLUDING DMSO) both used as the negative controls for the CRISPR plates?

niranjchandrasekaran commented 3 months ago

Hi all, I apologize for the delayed response. The paragraph shared by @yugeji is correct. I should update the text shared by @timtreis (thank you for catching that!).

We use both no-guide and non-targeting guide wells as negative control in our analyses (we don't use DMSO).

Please let me know if you have other questions.

yugeji commented 3 months ago

Okay, so to summarize, each CRISPR plate should contain:

Given this, would it be possible to share an updated layout @niranjchandrasekaran ? The reason we brought this up is because we absolutely could not find anything labeled "DMSO" on the CRISPR plates.

Thanks very much in advance!

niranjchandrasekaran commented 2 months ago

Hi @yugeji, if you use both compounds.csv.gz and crispr.csv.gz, along with well.csv.gz (all in the metadata folder), you will find the poscon compounds and DMSO wells. For example, when the run the following

well_df = pd.read_csv('metadata/well.csv.gz')
compound_df = pd.read_csv('metadata/compound.csv.gz')
crispr_df = pd.read_csv('metadata/crispr.csv.gz')

merged_df = (
    well_df.query("Metadata_Plate == 'CP-CC9-R1-03'")
    .merge(crispr_df, on="Metadata_JCP2022", how="left")
    .query(
        "Metadata_Symbol=='PLK1' or Metadata_NCBI_Gene_ID.isna()"
    )
    .merge(compound_df, on="Metadata_JCP2022", how="left")
    .assign(
        perturbation_id=lambda x: np.where(
            x.Metadata_Symbol.notna(), x.Metadata_Symbol, x.Metadata_InChIKey
        )
    )
)

merged_df.perturbation_id.value_counts()

I get

no-guide                       11
non-targeting                  10
IAZDPXIOMUYVGZ-UHFFFAOYSA-N     8
SRVFFFJZQVENJC-UHFFFAOYSA-N     4
PLK1                            4
IHLVSLOZUHKNMQ-UHFFFAOYSA-N     4
IVUGFMLRJOCGAS-UHFFFAOYSA-N     4
OINGHOPGNMYCAB-UHFFFAOYSA-N     4
GJFCONYVAUNLKB-UHFFFAOYSA-N     4
LOUPRKONTZGTKE-UHFFFAOYSA-N     4
KPBNHDGDUADAGP-UHFFFAOYSA-N     4
CQKBSRPVZZLCJE-UHFFFAOYSA-N     4
Name: perturbation_id, dtype: int64

IAZDPXIOMUYVGZ-UHFFFAOYSA-N is DMSO while the other InChIKey correspond to the 8 compounds poscons.

I should note that the number of wells in your comment is for the “median” plate. Some plates have more number of wells.