Closed L1angyan closed 7 months ago
Hi Yan,
Do you mind telling me a bit more about the analysis you are trying to run? For example, are you seeking to deploy the sceptre pipeline to analyze the Gasperini 2019 data?
The data in the Dropbox repo (which are associated with the "Robust DE testing..." preprint) are in ondisc
version 1.1.0 format. You can download ondisc
version 1.1.0 via install_github("timothy-barry/ondisc", ref = "v1.1")
. This should allow you to access the Gasperini data stored in the Dropbox repo.
However, note that ondisc
version 1.1.0 is not compatible with the sceptre
Nextflow pipeline. Instead, the Nextflow pipeline requires ondisc
version 1.2.0 (i.e., the current version of ondisc
). I can supply you with the Gasperini data stored in ondisc
version 1.2.0 if that would be helpful.
Tim
Hi, Tim
Thank you for the prompt reply.
Definitely, I am intersted in some enhancers in K562 cell line and seeking to deploy the sceptre pipeline to analyze the Gasperini 2019 data. The Gasperini data stored in ondisc version 1.2.0 you said are the input file mentioned in import-gasperini-2019-v3 repos ?
└── at-scale
├── processed
│ ├── grna.odm
│ ├── response.odm
│ ├── sceptre_object.rds
│ └── sceptre_object_in_memory.rds
If yes, Gasperini data stored in ondisc version 1.2.0 are exactly what I need. You will share me a dropbox link, right ? I am so grateful for your help !!!
Yan
Hi Yan,
Got it. We were planning to do a trans analysis of the Gasperini data as part of our current project. I think I will get to this in about a week or two, probably. Perhaps I could send you the data and preliminary results at that time?
Tim
Hi Yan,
Got it. We were planning to do a trans analysis of the Gasperini data as part of our current project. I think I will get to this in about a week or two, probably. Perhaps I could send you the data and preliminary results at that time?
Tim
Thank you Tim. I am looking forward to your results.
Yan
Hi Tim, I followed the import-gasperini-2019-v3 pipeline to generate these files:
(sceptre) [liangyan@mu01 at-scale]$ tree processed/ 1.sceptre_res/
processed/
├── grna.odm
├── pairs_grouped.rds
├── response.odm
├── sceptre_object_mem.rds
└── sceptre_object.rds
1.sceptre_res/
├── 1.runSCEPTRE_pipeline.sh
├── 1.sceptre_outputs
│ ├── analysis_summary.txt
│ ├── grna_assignment_matrix.rds
│ ├── plot_assign_grnas.png
│ ├── plot_grna_count_distributions.png
│ ├── plot_run_calibration_check.png
│ ├── plot_run_discovery_analysis.png
│ ├── plot_run_power_check.png
│ ├── plot_run_qc.png
│ ├── results_run_calibration_check.rds
│ ├── results_run_discovery_analysis.rds
│ └── results_run_power_check.rds
├── 1.sceptre.R
├── 1.sgRNA_DEanalysis.csv
├── 1.sgRNA_info.csv
├── 302669.err
└── 302669.out
Interestingly, I found that only on chromosome 9, there are some siginicant enhancer-gene pairs which is whose distance is much greater than 500kb. For example, the UBAC1 gene located far away form the two enhancer. It seems strange, isn't it?
However, the GSE120861_gene_gRNAgroup_pair_table.at_scale.txt do not inclue the enhancer-gene pairs.
Hi Lyon,
Could you provide some details about the analysis that you ran? For example, did you run a cis analysis or trans analysis (I'm guessing the former)? Were you using the ondisc
-backed sceptre_object
or the standard sceptre_object
? And did you use the R console interface or the Nextflow pipeline?
I am not sure about the enhancer located at ~chr9:135879214, but my suspicion is that this enhancer regulates a nearby transcription factor, which in turn regulates UBAC1.
By the way, I plan to run a trans analysis on the Gasperini data today. Perhaps we could discuss more when I obtain those results.
I used the ondisc-backed sceptre_object just in the R console interface. Actually, I run an cis-analysis, thus I do not think for enhancer located at ~chr9:135879214, the UBAC1 gene should be tested.
By the way, the gene-sgRNA pairs table GSE120861_gene_gRNAgroup_pair_table.at_scale.txt do not include the ~chr9:135879214 enhancer-UBAC1 pairs.
This is the the code I run, following get started with sceptre.
library(sceptre)
library(dplyr)
library(conflicted)
conflict_prefer("filter", "dplyr")
gasp_offsite <- paste0("/public/home/liunangroup/liangyan/project/tf-assymmetry/20240415_CreType-diff/CRISPR_screen/import-gasperini-2019-v3/",
"at-scale/")
processed_data_dir <- paste0(gasp_offsite, "processed/")
sceptre_object <- read_ondisc_backed_sceptre_object(
sceptre_object_fp = paste0(processed_data_dir, "sceptre_object.rds"),
response_odm_file_fp = paste0(processed_data_dir, "response.odm"),
grna_odm_file_fp = paste0(processed_data_dir, "grna.odm"))
sceptre_object
write.csv(sceptre_object@grna_target_data_frame,'1.sgRNA_info.csv',row.names=F,col.names=T)
positive_control_pairs <- construct_positive_control_pairs(sceptre_object)
discovery_pairs <- construct_cis_pairs(
sceptre_object,
positive_control_pairs = positive_control_pairs)
# apply the pipeline functions to the sceptre_object in order
sceptre_object <- sceptre_object |> # |> is R's base pipe, similar to %>%
set_analysis_parameters(discovery_pairs, positive_control_pairs,resampling_mechanism="permutations") |>
run_calibration_check(parallel = TRUE) |>
run_power_check(parallel = TRUE) |>
run_discovery_analysis(parallel = TRUE)
# write the results to disk
write_outputs_to_directory(sceptre_object, "1.sceptre_outputs")
thread='32'
sbatch --job-name=job --partition=cu --nodes=1 --ntasks-per-node=$thread --output=%j.out --error=%j.err --wrap="
/public/home/liunangroup/liangyan/software/miniconda3/envs/sceptre/bin/Rscript 1.sceptre.R
"
I guess I find the reason. sceptre using coordinate of gene position in hg38 reference genome. However, the results of gasperini-2019 based on hg19 genome.
Actually, the TSS of UBAC1 in hg38 is Chromosome 9: 135,932,969-135,961,373, locating at the cis-region of ~chr9:135879214
?construct_cis_pairs
#Usage:
#
# construct_cis_pairs(
# sceptre_object,
# positive_control_pairs = data.frame(),
# distance_threshold = 500000L,
# response_position_data_frame = gene_position_data_frame_grch38)
Ah, I see. The sceptre
high-MOI example uses gene_position_data_frame_grch38
in construct_cis_pairs()
, but really it should be using gene_position_data_frame_grch19
. 😳 I will add gene_position_data_frame_grch19
to the package and fix the example. Thank you for noticing this!
By the way, I ran a trans analysis of the Gasperini data using the sceptre
pipeline. Note that the trans analysis pairs all gRNAs to all genes; thus, the issue with the construct_cis_pairs()
function is not relevant here. You can download the results here: https://www.dropbox.com/scl/fo/q2sasfxtxzprsfazyjbdm/AOjk2GDhhyjzdZcdpk2QuTs?rlkey=2drsij7e1l2wytaaqfxg0ro6h&st=5jvgi08f&dl=0. Note that these results are preliminary, and we will officially release these results as part of our upcoming manuscript. The book provides instructions for interacting with the .parquet
files.
Oh, I should add that I randomly grouped the NT gRNAs into groups of size two for this analysis. I then tested each NT gRNA group against the entire set of genes. The NT gRNA group names begin with "nt_".
Hi Tim, Thank you for your help. Looking forward to your new work.
By the way, I guess the purpose of grouping NT sgRNAs is acquiring some baseline effect of negtive control for hypothesis testing, so I do not need to pay attention to the groups begin with "nt_", right?
Yes, that is right, you can ignore the gRNA targets that begin with "nt". (However, if you want to convince yourself that sceptre is doing the right thing on these data --- i.e., controlling type-I error --- then you may want to look at the "nt" gRNA targets).
(Note that sceptre does not use the NT gRNAs to test hypotheses or compute p-values; rather, sceptre uses the NT gRNAs to verify control of type-I error/FDR.)
Hi Tim, I downloaded the processed import-gasperini-2019-v2 data from the dropbox link you shared in sceptre2-manuscript. I want to generated a sceptre object using the code in import-gasperini-2019-v2.
However, when I ran
gene_odm <- read_odm(paste0(processed_gene_dir, "matrix.odm"), paste0(processed_gene_dir, "metadata.rds"))
I got an error:
Error in read_odm(paste0(processed_gene_dir, "matrix.odm"), paste0(processed_gene_dir, : could not find function "read_odm"
Is the
read_odm
funtion has been deleted? How I can get the sceptre object from these input ?Best Yan