griffithlab / regtools

Integrate DNA-seq and RNA-seq data to identify mutations that are associated with regulatory effects on gene expression.
https://regtools.readthedocs.org
MIT License
121 stars 26 forks source link

Questions about compare_junctions_hist_v2.R #178

Open R-Najjar opened 1 year ago

R-Najjar commented 1 year ago

Thank you for making this super useful software.

  1. When running "compare_junctions_hist_v2.R", there were multiple NA values in "mean_norm_score_variant". I traced it back to the function starting on line 204, where the unlist (strsplit) functions are needed for the first condition too (added below).

  2. The example workflow recommends using --variant-grouping=exclude when running stats_wrapper.py. Is there an equivalent feature in "compare_junctions_hist_v2.R"?

  3. There are duplicates (records with same variant_junction_info) in the final dataset after running "compare_junctions_hist_v2.R". Do you recommend de-duplicating them? If so, how?

a <- function(x, y){
  # if(y == "TCGA-ZH-A8Y2-01A,TCGA-ZH-A8Y5-01A"){
  #    browser()
  # }
  toAdd <- (str_count(y, ',') + 1) - (str_count(x, ',') + 1) 
  # browser()
  if (toAdd > 0) {
    toAdd <- rep(0.0000000, toAdd)
    x <- c(unlist(strsplit(x, ",")), toAdd) # added unlist(strsplit) here too
  } else {
    x <- unlist(strsplit(x, ","))
  }
  x <- list(x)
  return(x)
}
mguaita commented 2 months ago

Hello,

I am still encountering this same issues and others already reported in the compare_junctions of the example WF #153.

@R-Najjar have you been able to gain any insights on your questions?

I have also tried the python version compare_jucntions_hist.py and got a different errors when merging the score data of the samples_with_variants and the samples_wout_variants.

R-Najjar commented 2 months ago

@mguaita I gave up on this tool because of these issues.

mguaita commented 2 months ago

@R-Najjar thank you for replying.

Have you been able to find any alternative approaches to perform the statistical analysis of the Regtools' output comparisons?