Closed DSuveges closed 1 year ago
Screen types:
+-------------------------------+-----+
|SCREEN_TYPE |count|
+-------------------------------+-----+
|Negative Selection |942 |
|Positive and Negative Selection|233 |
|Positive Selection |183 |
|Phenotype Screen |124 |
+-------------------------------+-----+
Experimental setup:
+-----------------------------------------+-----+
|EXPERIMENTAL_SETUP |count|
+-----------------------------------------+-----+
|Timecourse |1044 |
|Drug Exposure |271 |
|Virus Exposure |84 |
|Toxin Exposure |17 |
|Cytokine exposure |12 |
|Ligand Exposure |11 |
|Bacteria Exposure |9 |
|Other |9 |
|NK cell exposure |8 |
|Implantation to Mouse Model |5 |
|Radiation Exposure |4 |
|T cell exposure |3 |
|Oxygen Exposure |2 |
|Cytokine depletion |1 |
|Transferrin receptor (TFRC/CD71) exposure|1 |
|SARS-CoV-2 Spike-RBD exposure |1 |
+-----------------------------------------+-----+
Setup vs screen type:
+-----------------------------------------+-------------------------------+-----+
|EXPERIMENTAL_SETUP |SCREEN_TYPE |count|
+-----------------------------------------+-------------------------------+-----+
|Timecourse |Negative Selection |905 |
|Drug Exposure |Positive and Negative Selection|132 |
|Drug Exposure |Positive Selection |103 |
|Timecourse |Phenotype Screen |84 |
|Timecourse |Positive and Negative Selection|50 |
|Virus Exposure |Positive Selection |48 |
|Drug Exposure |Negative Selection |27 |
|Virus Exposure |Positive and Negative Selection|26 |
+-----------------------------------------+-------------------------------+-----+
Although screens are coming from 240+ publications, most of them the ~2k studies are coming from a handful of papersl:
+--------+-----+
|pubmedId|count|
+--------+-----+
|29083409|340 |
|30971826|325 |
|29526696|45 |
|33539788|36 |
|27260156|33 |
|32649862|31 |
|32990596|21 |
|30995489|18 |
|35559673|17 |
|34049503|16 |
+--------+-----+
When focusing on the TOP5 publications (['29083409', '30971826', '29526696', '33539788', '27260156']
), which are representing almost 50% of all screens, the split of the experimental design is simpler:
+-------------------------------+------------------+-----+
|SCREEN_TYPE |EXPERIMENTAL_SETUP|count|
+-------------------------------+------------------+-----+
|Positive and Negative Selection|Drug Exposure |36 |
|Negative Selection |Timecourse |743 |
+-------------------------------+------------------+-----+
@buniello has done the EFO mappings (spreadsheet) for cell proliferation screens based on the applied disease cell line eg.
Acute Myeloid Leukemia Cell Line
-> EFO_0000222 (acute myeloid leukemia)
These mappings will then be collected into one single table (cell-line vs efo) that can be used to join with study table upon evidence generation.
The heterogeneity of this dataset makes it very complicated to ingest hence the value gained is not proportional to the effort required.
The BioGRID distributes CRISPR ~2k screen data under MIT license. As of 2023.04.02, the most recent available release is 1.1.13, which was released on October 2022. Files can be downloaded from here. The compressed archive contain two file types:
Study metadata file:
BIOGRID-ORCS-SCREEN_INDEX-1.1.13.index.tab.txt
Screen data:
BIOGRID-ORCS-SCREEN_{screen_id}-1.1.13.screen.tab.txt
Conclusion
Once there is a solidified plan and a data model on what screens can feed into what datasets, it would be very easy to map study metadata with studies and extracting the significant is also trivial given the boolean column indicating hits.
The most difficult part is to interpret studies.
1. Scoping phase
The scoping phase is done in collaboration with @buniello.