Closed csijcs closed 3 years ago
I am also getting some memory related errors
Error in gi_list_write(gi_list, fname = output_path) :
R_Reprotect: only -7 protected items, can't reprotect index -12
I tried increasing the size of R memory size
cat ~/.Renviron
R_MAX_VSIZE=760Gb
but this does not help.
My sessionInfo() is roughly the same as the original poster.
Hi @csijcs @jaavedm , I would like to get reproducible data examples that generates the error, maybe some paths to input data files in case these are accessible, or an RDS save of one list item of the gi_list object, etc. that can process gi_list_write(gi_list,fname=output_path,rows="significant")
but not gi_list_write(gi_list, fname = output_path)
. I don't think this is a memory error.
As for the hicdcdiff()
error, the likely cause seems to be on the dispersion fit type. I should follow the DESeq2
error and go with gene wise estimates by raising a warning. I would still like to see the input data and the filter_file
that generates this error though.
Hi @mervesa,
Unfortunately, the error does not happen with just one input file, but rather after processing 2-3 files and then moving onto the next file in the for loop. I've sent you a few files to your email address. Here is my repro example code. I didn't get to the point of calling hicdcdiff() yet.
library("HiCDCPlus")
library("DESeq2")
setwd("/storage/jaavedm/")
#generate features
outdir="/home/jaavedm/";
construct_features(output_path=paste0(outdir,"/hg38_100kb_GATC"),
gen="Hsapiens",gen_ver="hg38",
sig="GATC",bin_type="Bins-uniform",
binsize=100000,
chrs=c("chr1"))
#add .hic counts
hicproRoot <- "/storage/jaavedm/myfiles/"
file.bases <- c(
"H9_H3K27ac_rep1",
"H9_H3K27ac_rep2",
"H9_H3K27ac_rep3",
"H9_H3K27ac_rep4",
"H9_H3K27ac_rep5"
)
hicfile_paths<-c();
for(i in 1:length(file.bases)) {
f <- file.bases[i];
hicfile_paths[i] <- sprintf("%s/%s.allValidPairs", hicproRoot, f);
if(!file.exists(hicfile_paths[i]))
{
stop(sprintf("File does not exist:\n%s\n", hicfile_paths[i]));
}
}
indexfile<-data.frame()
for(i in 1:length(hicfile_paths)) {
hicfile_path <- hicfile_paths[i];
cat(sprintf("\n%d of %d: Starting to process file %s\n\n", i, length(hicfile_paths), hicfile_path));
output_path<-paste0(outdir,'/', gsub("^(.*[\\/])", "",gsub('.allValidPairs','.txt.gz',hicfile_path)))
#generate gi_list instance
gi_list<-generate_bintolen_gi_list(bintolen_path=paste0(outdir,"/hg38_100kb_GATC_bintolen.txt.gz"), gen="Hsapiens",gen_ver="hg38");
gi_list<-add_hicpro_allvalidpairs_counts(gi_list, allvalidpairs_path = hicfile_path)
#expand features for modeling
gi_list<-expand_1D_features(gi_list)
#run HiC-DC+ on 2 cores
set.seed(1010) #HiC-DC downsamples rows for modeling
gi_list<-HiCDCPlus_parallel(gi_list,ssize=0.1, ncore=12)
for (i in seq(length(gi_list))){
indexfile<-unique(rbind(indexfile, as.data.frame(gi_list[[i]][gi_list[[i]]$qvalue<=0.05])[c('seqnames1', 'start1','start2')]))
}
#write results to a text file
gi_list_write(gi_list,fname=output_path) # <-- this will fail
rm(gi_list)
}
Hello @mervesa (and greetings also to my old friend @jaavedm),
My code is basically the same as @jaavedm, but I also received the error with a single sample and even a single chromosome from the gi_list
. I was able to solve the gi_list_write()
problem by increasing the buffMB
option in the data.table::fwrite
call of the source code.
And now the hicdcdiff()
function is working properly with the full table of all rows.
Hi @jaavedm , @csijcs , I have reproduced the warning when I had data.table
versions up to 1.13.6, but the warning does not appear when I run on a session with the latest version of data.table
(released December 30th, 2020 and involves a data.table::fwrite
fix). Could you please try upgrading data.table
and see if the warning still appears on your end? I will need to put a version dependency on data.table
if it doesn't appear anymore. Thanks a lot!
The gi_list_write()
function is working properly now with data.table_1.13.7
on my end.
Unfortunately, the error still occurs with 1.13.7 data.table package.
I only found 1.13.6 on CRAN, so I had to update to the latest data.table dev build using data.table::update.dev.pkg()
I tried going back down to an older version (1.13.2) and the problem still occurs.
In the end I managed to get this working by setting data.table threads to 1. By default getDTthreads
said it was using 56 cores. I set threads using setDTthreads(threads = 1);
I'm not sure why this worked, but it fixed the R_Reprotect errors. I suspect there is something about forking and memory handling not behaving properly in multithreaded settings. Also, I don't know how much of a performance impact setting threads = 1 will affect this package.
Very excited to try your tool! I'm having a little trouble with the gi_list_write command. I get the following error:
I am able to write on the significant rows by calling:
gi_list_write(gi_list,fname=output_path,rows="significant")
However, then when I try to run hicdcdiff on those files I get an error I suppose is related to the fact that there are only the significant regions and not all regions for background:
Error in DESeq2::estimateDispersionsFit(dds, fitType = fitType) : all gene-wise dispersion estimates are within 2 orders of magnitude from the minimum value, and so the standard curve fitting techniques will not work. One can instead use the gene-wise estimates as final estimates: dds <- estimateDispersionsGeneEst(dds) dispersions(dds) <- mcols(dds)$dispGeneEst ...then continue with testing using nbinomWaldTest or nbinomLRT
I have tried with multiple different samples and get the same errors. Here is my sessionInfo: `sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.1 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages: [1] DESeq2_1.30.0 SummarizedExperiment_1.20.0 Biobase_2.50.0
[4] MatrixGenerics_1.2.0 matrixStats_0.57.0 dplyr_1.0.2
[7] BSgenome.Hsapiens.UCSC.hg38_1.4.3 BSgenome_1.58.0 rtracklayer_1.50.0
[10] Biostrings_2.58.0 XVector_0.30.0 GenomicRanges_1.42.0
[13] GenomeInfoDb_1.26.2 IRanges_2.24.1 S4Vectors_0.28.1
[16] BiocGenerics_0.36.0 HiCDCPlus_0.99.5
`