Closed jorainer closed 2 years ago
opentimsr
has its own parallel processing setup (opentims_set_threads
) which clashes with BiocParallel
-based parallel processing. So, we either perform the processing in parallel by file with BiocParallel
(and need to disable opentimsr
parallel processing) or we perform it in serial with opentimsr
parallel processing enabled.
Seems the parallel processing has a little benefit if large data files are processed:
fls <- c("TimsTOF/Methanolpos-1-TIMS_108_1_2007.d",
"TimsTOF/SRM1950_20min_88_01_6950.d")
be <- backendInitialize(MsBackendTimsTof(), fls)
peakRAM(
{
opentims_set_threads(1)
MsBackendTimsTof:::.get_tims_columns(be, columns = c("mz", "intensity"))
},
{
opentims_set_threads(2)
MsBackendTimsTof:::.get_tims_columns(be, columns = c("mz", "intensity"))
},
{
opentims_set_threads(1)
MsBackendTimsTof:::.get_tims_columns_p(be, columns = c("mz", "intensity"), BPPARAM = SerialParam())
},
{
opentims_set_threads(2)
MsBackendTimsTof:::.get_tims_columns_p(be, columns = c("mz", "intensity"), BPPARAM = SerialParam())
},
MsBackendTimsTof:::.get_tims_columns_p(be, columns = c("mz", "intensity"), BPPARAM = MulticoreParam(2))
)
Function_Call
1 {opentims_set_threads(1)MsBackendTimsTof:::.get_tims_columns(be,columns=c("mz","intensity"))}
2 {opentims_set_threads(2)MsBackendTimsTof:::.get_tims_columns(be,columns=c("mz","intensity"))}
3 {opentims_set_threads(1)MsBackendTimsTof:::.get_tims_columns_p(be,columns=c("mz","intensity"),BPPARAM=SerialParam())}
4 {opentims_set_threads(2)MsBackendTimsTof:::.get_tims_columns_p(be,columns=c("mz","intensity"),BPPARAM=SerialParam())}
5 MsBackendTimsTof:::.get_tims_columns_p(be,columns=c("mz","intensity"),BPPARAM=MulticoreParam(2))
Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
1 108.674 1662.1 4507.6
2 105.141 1628.1 5559.2
3 109.299 1628.2 5457.3
4 106.240 1628.1 5457.2
5 80.762 1628.1 3306.4
So, the last call is the only one that uses parallel processing on a per-file basis. The per-file based parallel processing has advantages over the built-in parallel processing of opentimsr
. There is only very little benefit for running that.
I fix not some internal things and replace the for
-loop-based processing with the parallel version.
Just realized that parallel processing in the backend makes no sense - parallel processing is taken care of by the Spectra
object. Thus, the only thing we need is to disable opentimsr
parallel processing to not interfere with BiocParallel
.
Check if parallel processing is possible with
opentimsr
/MsBackendTimsTof
and compare performance against serial processing.