Closed TeresaPegan closed 1 year ago
Hi, apologies that I never answered this! Term was very busy and this seems to have slipped through
This is almost certainly a bug with STITCH. The read splitting code is very old and not as well tested as other parts of the code base that have been re-worked or developed. You are fine to turn this off, in practice, it only really matters with very long reads and very high heterozygosity.
If you are able and willing, it would be useful if you're able to send me the data that got fed into this, so I can re-run the read splitting to understand the problem and fix it for good (and build a test for when I properly re-write the code). If you're able to install locally (i.e. clone the github repo and install), you could modify the R code so that just after the function declaration https://github.com/rwdavies/STITCH/blob/master/STITCH/R/heuristics.R#L1161 everything input to the function is saved
save(gammaK_t, eHapsCurrent_t, K, L, iSample, sampleReads, tempdir, regionName, grid, verbose, method, pRgivenH1_m, pRgivenH2_m, srp, file = file.path(tempdir, paste0("temp.", iSample, ".RData")))
and could send to me to investigate (assuming you are allowed to do that with your data)
Glad to hear your info score is 0.9 otherwise, that sounds quite promising
PS you might need to initialize tempdir to something that won't get destroyed when R closes, like file.path(outputdir, "temp")
Hi, I can send you the data. However, I was never able to get STITCH to install using the github repo, only using conda.
I thought I might be able to get this working by re-running the code for just the "findRecombinedReadsPerSample" function in my R environment, after loading the library, but before using STITCH on my data. See the code below. However, it does not seem to be working, or at least it does not produce the .Rdata file. Do you know if I should be able get it to save the data into a .Rdata file this way? If so, do you know I could change the code to make it work? Thanks!
library("STITCH")
findRecombinedReadsPerSample <- function(
gammaK_t,
eHapsCurrent_t,
K,
L,
iSample,
sampleReads,
tempdir,
regionName,
grid,
verbose = TRUE,
method = "diploid",
pRgivenH1_m = NULL,
pRgivenH2_m = NULL,
srp = NULL
) { save(gammaK_t, eHapsCurrent_t, K, L, iSample, sampleReads, tempdir, regionName, grid, verbose, method, pRgivenH1_m, pRgivenH2_m, srp, file = "/Users/tmpegan/TEST.Rdata" )
K <- dim(eHapsCurrent_t)[1]
## needs a full run
## only do for some - need at least 3 SNPs to consider
w <- get_reads_worse_than_50_50(
sampleReads = sampleReads,
eHapsCurrent_t = eHapsCurrent_t,
K = K
)
w <- w[w != 1 & w != length(w)]
count <- 0
if (length(w) > 0) {
for (w1 in w) {
out <- split_a_read(
sampleReads = sampleReads,
read_to_split = w1,
gammaK_t = gammaK_t,
L = L,
eHapsCurrent_t = eHapsCurrent_t,
K = K,
grid = grid,
method = method,
pRgivenH1_m = pRgivenH1_m,
pRgivenH2_m = pRgivenH2_m,
srp = srp
)
sampleReads <- out$sampleReads
pRgivenH1_m <- out$pRgivenH1_m
pRgivenH2_m <- out$pRgivenH2_m
srp <- srp
count <- count + as.integer(out$did_split)
} # end of loop on reads
new_order <- order(unlist(lapply(sampleReads,function(x) x[[2]])))
sampleReads <- sampleReads[new_order]
save(sampleReads, file = file_sampleReads(tempdir, iSample, regionName), compress = FALSE)
if (verbose) {
print_message(paste0(
"sample ", iSample, " readsSplit ", count, " readsTotal ", length(sampleReads)
))
}
if (method == "pseudoHaploid") {
## randomize those of split reads
srp <- srp[new_order]
pRgivenH1_m <- pRgivenH1_m[new_order, , drop = FALSE]
pRgivenH2_m <- pRgivenH2_m[new_order, , drop = FALSE]
save(
srp, pRgivenH1_m, pRgivenH2_m,
file = file_sampleProbs(tempdir, iSample, regionName)
)
}
}
return(
list(
readsSplit = count,
readsTotal = length(sampleReads)
)
)
}
STITCH(tempdir = tempdir(), chr = "MDLI01000001.1", bamlist = "S_coronata/S_coronata_all_RG.txt", posfile = "S_coronata/S_coronata_MDLI01000001.1.txt", outputdir = paste0(getwd(), "/", "stitchout_15k40"), K = 40, nGen = 15000, nCores = 1, switchModelIteration = 39, method="pseudoHaploid")
This approach should work, if you do something like the following. With the code above, the STITCH command still works off the library version, which uses the library version of findRecombinedReadsPerSample. The below will put everything in the global environment
So steps are Clone the repo somewhere locally Modify the file above in the way shown by the code you copied (i.e. adding the "save" bit) Manually load the R files after the library command in R into your session, using code like the below, copied below for reference
https://github.com/rwdavies/STITCH/blob/master/STITCH/tests/testthat/test-acceptance-one.R#L3
library("testthat"); library("STITCH"); library("rrbgen")
dir <- "~/proj/STITCH/" ## i.e. change this to global path where you've downloaded the repo, which now includes the manually modified version of the findRecombinedReadsPerSample
setwd(paste0(dir, "/STITCH/R"))
a <- dir(pattern = "*R")
b <- grep("~", a)
if (length(b) > 0) {
a <- a[-b]
}
o <- sapply(a, source)
setwd(dir)
Sys.setenv(PATH = paste0(getwd(), ":", Sys.getenv("PATH")))
## then something like your
STITCH(tempdir = tempdir(), chr = "MDLI01000001.1", bamlist = "S_coronata/S_coronata_all_RG.txt", posfile = "S_coronata/S_coronata_MDLI01000001.1.txt", outputdir = paste0(getwd(), "/", "stitchout_15k40"), K = 40, nGen = 15000, nCores = 1, switchModelIteration = 39, method="pseudoHaploid")
Hello, Today I've run into an error that consistently occurs at iteration 26 for one of my chromosomes. It seems likely to be caused by something happening in iteration 25, which is the default split reads iteration. When I tried setting splitReadIterations = NULL, STITCH ran with no errors.
Do you have any ideas about what might trigger this error? (See details below).
Also, is setting splitReadIterations = NULL an ok way to circumvent the problem, or do you think skipping the split reads iteration could significantly detract from my results? I don't have high coverage results to compare my output to, but I did find that the average INFO score from the run where I set splitReadIterations = NULL was 0.93.
Thanks again! :)
My input code:
Detailed output: