Open melpetera opened 5 years ago
Very good investigation so far! Now I need an object to run
chromTIC <- chromatogram(xdata, aggregationFun = "sum")
locally.
From Sue I got the Galaxy2633-[xset.merged.groupChromPeaks.RData].rdata.xcms.group
which includes the xdata
(see below). I'll try to reproduce here.
Yours, Steffen
> xdata
MSn experiment data ("XCMSnExp")
Object size in memory: 522.83 Mb
- - - Spectra data - - -
MS level(s): 1
Number of spectra: 2342035
MSn retention times: 0:0 - 20:4 minutes
- - - Processing information - - -
Concatenated [Wed Feb 27 19:24:26 2019]
MSnbase version: 2.4.0
- - - Meta data - - -
phenoData
rowNames: ./pos_128_2018_G_PHLPRA_A045_a_1-D,3_01_13482.mzML
./pos_128_2018_F_LEUVUL_A018_b_1-C,4_01_13505.mzML ...
./pos_QC_grass_2018_2-A,1_01_13512.mzML (655 total)
varLabels: sample_name sample_group
varMetadata: labelDescription
Loaded from:
[1] pos_128_2018_G_PHLPRA_A045_a_1-D,3_01_13482.mzML... [655] pos_QC_grass_2018_2-A,1_01_13512.mzML
Use 'fileNames(.)' to see all files.
protocolData: none
featureData
featureNames: F1.S0001 F1.S0002 ... F655.S3576 (2342035 total)
fvarLabels: fileIdx spIdx ... spectrum (28 total)
fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
- - - xcms preprocessing - - -
Chromatographic peak detection:
method: centWave
10408519 peaks identified in 655 samples.
On average 15891 chromatographic peaks per sample.
Correspondence:
method: chromatographic peak density
16907 features identified.
Median mz range of features: 0.004791
Median rt range of features: 16.356
Great, can reproduce locally:
Error in serialize(data, node$con, xdr = FALSE) :
error writing to connection
Calls: source ... <Anonymous> -> .local -> .extractMultipleChromatograms
Execution halted
on
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] xcms_3.3.3 MSnbase_2.6.1 ProtGenerics_1.12.0
[4] mzR_2.14.0 Rcpp_0.12.17 BiocParallel_1.14.2
[7] Biobase_2.40.0 BiocGenerics_0.26.0
... and indeed no issue when running serial:
> register(SerialParam())
> ...
> chromTIC <- chromatogram(xdata, aggregationFun = "sum")
>
That's the first time I saw TB
as unit for memory usage (albeit a small number ...).
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19 root 20 0 12.479g 0.010t 2820 R 81.4 17.1 7:04.94 R
20 root 20 0 12.444g 0.010t 2820 S 75.1 17.1 6:49.62 R
On our system MulticoreParam(2)
still works, while the above failure
was with all cores. I guess the max. memory usage is a function of the number
of samples that have to be loaded simultaneously (or number of features ? Unsure).
Final object has
> print(object.size(xdata), units="MB")
522.8 Mb
> print(object.size(chromTIC), units="MB")
359.7 Mb
not sure if memory usage could be reduced during parallel processing.
Yours, Steffen
This is interesting. If it was indeed due to a memory limit I would however expect a different error message. I got these messages when I run out of forks on macOS (I wasn't aware that there is such a limit though). That's also a reason why I like to pre-register all cores before running anything with xcms
- each parallel processing step re-uses then the same processes.
@sneumann , do you still get the same error if you do
register(bpstart(MulticoreParam()))
before the chromatogram extraction?
Regarding the memory usage: the chromatogram
function will only read those spectra matching the EIC's retention time range from the original files. In the worst case it would read the full data of a file in each parallel process.
To confirm a way to have a quick fix for this matter, I also tested the register(SerialParam())
and indeed obtained my results without problem, as @sneumann did.
I could not test register(bpstart(MulticoreParam()))
since I'm running R on Windows. I tested register(bpstart(SnowParam()))
just to see, but still got an error writing to connection
.
Thanks for reporting @melpetera - and what happens if you use register(bpstart(SnowParam(2)))
- just limiting to two parallel processes?
Tested, and still got the main error
Error in serialize(data, node$con) : error writing to connection
but without the notice of
Error: failed to stop ‘SOCKcluster’ cluster: error writing to connection
This sounds a little like problems with parallel processing on that particular Windows machine. AFAIK in snow/sock-based parallel processing the master process talks to the slave processes via sockets and might need to get network access. I've seen sometimes that the firewall or something is preventing this.
Hi @melpetera , could you confirm that the original error was reported in W4M Ticket#2019030210000026 ? Because then it should be debugged on that infrastructure, since Windows brings in quite a bit of additional/other challenges, and solutions could be different. Yours, Steffen
Hi there, I am new in posting issues so please be kind if I do not provide enough information here.
We encountered a problem while using the 'chromatogram' function with huge dataset. Initially I got the following error while using the W4M Galaxy module based on XCMS dedicated to ploting TIC and BIC:
Although I regularly use this module for various dataset with no problem, here it was the case of a particularly huge dataset. We are talking about 1576 samples with 8439 peaks per sample on average (that makes 13,300,226 peaks identified).
It just happened that other people encountered exactly the same problem on there data while I was investigating this: @sneumann and @MarrSue
Let's just go over what I tested and concluded for now, illustrating it with the dataset I got first, but I guess we would have same conclusion with the data used by @MarrSue
First attempt via W4M Galaxy module dedicated to ploting TIC and BIC The R conditions:
The error obtained is already given above. For information, the same error is obtained when trying to compute TIC and BIC via the module dedicated to retention time correction. This was awaited since this module also display TIC and BIC.
I tryed checking the validity of the data by trying to do other things with the same data. Finding peak groups is no problem (we can easily get something like 10,909 groups). Exporting data into a peak table is also ok. I concluded it was not due to corrupted data.
Second attempt: reproducing the error out of Galaxy I installed R on a Windows server we have in my workplace, for the machine is quite competitive in ressources (at least more than my laptop which could have suffer a little). Then I runned the code corresponding to TIC and BIC plot, directly taken from the corresponding Galaxy module script.
Here is the concerned line, xdata being the MSn experiment data:
chromTIC <- chromatogram(xdata, aggregationFun = "sum")
Additionnal info - xdata:
Additionnal info - sessionInfo():
And so, error reproduced with little additional info:
I made some quick research about it. It seems it could be a problem while trying to do some parallele work (things like when you use foreach R package), that could in some circuntances leads to problem about memory usage or things like that. Truth is I am not too confortable with this kind of R topics, and in fact I am also very bad a researching info anyway so... I guess a little help here would be highly appreciated I should admit.
So any idea about what is happening here? I do not actually know how the
chromatogram
function works, so maybe you guys from xcms would better know if something is suspicious here.Tagging my colleague: @lecorguille