What are the actual controls for the CRISPR plates?

timtreis commented 5 months ago

Hey everyone, we're trying to compare data across "perturbation-universes". For this, we're having a deeper look into the controls in the CRISPR plates.

Their layout is, for example, like this:

Chandrasekaran et al. states that:

By deduction, that'd mean that no-guide is DMSO. However, DMSO would be a bad control for CRISPR, since the guide RNAs, unlike drugs, are not dissolved in it but rather some transfection agent. Were these acquired to stay comparable to the other drug plates?

@yugeji

yugeji commented 5 months ago

Also from the paper:

gewirtz commented 3 months ago

So to be very explicit-- are the no-guide and non-target (EXCLUDING DMSO) both used as the negative controls for the CRISPR plates?

niranjchandrasekaran commented 3 months ago

Hi all, I apologize for the delayed response. The paragraph shared by @yugeji is correct. I should update the text shared by @timtreis (thank you for catching that!).

We use both no-guide and non-targeting guide wells as negative control in our analyses (we don't use DMSO).

Please let me know if you have other questions.

yugeji commented 3 months ago

Okay, so to summarize, each CRISPR plate should contain:

8 compounds which are positive controls also appearing in the compound plates (x4)
10x no-guide (aka, absolutely no treatment)
10x non-targeting guides
8x DMSO
4x PLK1

Given this, would it be possible to share an updated layout @niranjchandrasekaran ? The reason we brought this up is because we absolutely could not find anything labeled "DMSO" on the CRISPR plates.

Thanks very much in advance!

niranjchandrasekaran commented 2 months ago

Hi @yugeji, if you use both compounds.csv.gz and crispr.csv.gz, along with well.csv.gz (all in the metadata folder), you will find the poscon compounds and DMSO wells. For example, when the run the following

well_df = pd.read_csv('metadata/well.csv.gz')
compound_df = pd.read_csv('metadata/compound.csv.gz')
crispr_df = pd.read_csv('metadata/crispr.csv.gz')

merged_df = (
    well_df.query("Metadata_Plate == 'CP-CC9-R1-03'")
    .merge(crispr_df, on="Metadata_JCP2022", how="left")
    .query(
        "Metadata_Symbol=='PLK1' or Metadata_NCBI_Gene_ID.isna()"
    )
    .merge(compound_df, on="Metadata_JCP2022", how="left")
    .assign(
        perturbation_id=lambda x: np.where(
            x.Metadata_Symbol.notna(), x.Metadata_Symbol, x.Metadata_InChIKey
        )
    )
)

merged_df.perturbation_id.value_counts()

I get

no-guide                       11
non-targeting                  10
IAZDPXIOMUYVGZ-UHFFFAOYSA-N     8
SRVFFFJZQVENJC-UHFFFAOYSA-N     4
PLK1                            4
IHLVSLOZUHKNMQ-UHFFFAOYSA-N     4
IVUGFMLRJOCGAS-UHFFFAOYSA-N     4
OINGHOPGNMYCAB-UHFFFAOYSA-N     4
GJFCONYVAUNLKB-UHFFFAOYSA-N     4
LOUPRKONTZGTKE-UHFFFAOYSA-N     4
KPBNHDGDUADAGP-UHFFFAOYSA-N     4
CQKBSRPVZZLCJE-UHFFFAOYSA-N     4
Name: perturbation_id, dtype: int64

IAZDPXIOMUYVGZ-UHFFFAOYSA-N is DMSO while the other InChIKey correspond to the 8 compounds poscons.

I should note that the number of wells in your comment is for the “median” plate. Some plates have more number of wells.

jump-cellpainting / datasets

What are the actual controls for the CRISPR plates? #96