zamboni-lab / SLAW

Scalable and self-optimizing processing workflow for untargeted LC-MS
GNU General Public License v2.0
25 stars 3 forks source link

*** caught segfault *** address 0x7f06ebc63efe, cause 'invalid permissions' #7

Closed YUANMY2021 closed 2 years ago

YUANMY2021 commented 2 years ago

Excuse me, This time I used the slaw_dev on docker but encountered the following problems: with none of the 3 csv files are generated. The start and the error part of my running output is are below: Thanks for reading this log and giving help.

2021-11-20|02:02:49|INFO: Total memory available: 2046976 and 144 cores. The workflow will use 14314 Mb by core on 143 cores. 2021-11-20|02:02:49|INFO: Guessing polarity from file:HFX3_CP7_FZTM210046837-1A.mzML 2021-11-20|02:02:49|DEBUG: Rscript /pylcmsprocessing/Rscript/get_polarity.R "/input/HFX3_CP7_FZTM210046837-1A.mzML" "/output/polarity" 2021-11-20|02:02:51|INFO: Polarity detected: positive 2021-11-20|02:02:51|DEBUG: Rscript /pylcmsprocessing/Rscript/createSQLiteexperiment.R -d '/input' -b '/output/temp_processing_db.sqlite' 2021-11-20|02:02:51|DEBUG: Rscript /pylcmsprocessing/Rscript/createSQLiteexperiment.R -d '/input' -b '/output/temp_processing_db.sqlite' 2021-11-20|02:02:52|INFO: STEP: initialisation TOTAL_TIME:2.92s LAST_STEP:2.92s 2021-11-20|02:02:53|DEBUG: Rscript /pylcmsprocessing/Rscript/reducing_mzml.R "/output/temp_optim/mzML" 500.0

....... ....... ........

Starting refinement 2021-11-20 07:29:42 End refinement 2021-11-20 07:30:57 Adding aligned features to the model. Starting adding aligned features to the model 2021-11-20 07:30:57 Found 605245 peaks belonging to the model and 52450 new features Ending adding aligned features to the model 2021-11-20 07:31:24 No more files to process. Saving alignment. Saving alignment.

caught segfault address 0x7f06ebc63efe, cause 'invalid permissions'

Traceback: 1: fread(path, header = TRUE, sep = ",") 2: extractSubDataMatrix.data.table(ff, subsample, subvariable) 3: buildDataMatrix.datatable(object@storage, subsample = subsample, subvariable = subvariable[all_order][sub_idx], max_sample = length(object@files), quant_var = quant_var, summary_vars = summary_vars, name_samples = names_cols, bpp = object@references@bpp) 4: exportDataMatrix(lam, path = PATH_OUT_DATAMATRIX, quant_var = VAL_INTENSITY, subvariable = which(lam@peaks$num >= 2), summary_vars = c("mz", "rt", "rt_cor", supp_args)) 5: withCallingHandlers(expr, warning = function(w) if (inherits(w, classes)) tryInvokeRestart("muffleWarning")) 6: suppressWarnings(exportDataMatrix(lam, path = PATH_OUT_DATAMATRIX, quant_var = VAL_INTENSITY, subvariable = which(lam@peaks$num >= 2), summary_vars = c("mz", "rt", "rt_cor", supp_args))) 7: withCallingHandlers(expr, message = function(c) if (inherits(c, classes)) tryInvokeRestart("muffleMessage")) 8: suppressMessages(suppressWarnings(exportDataMatrix(lam, path = PATH_OUT_DATAMATRIX, quant_var = VAL_INTENSITY, subvariable = which(lam@peaks$num >= 2), summary_vars = c("mz", "rt", "rt_cor", supp_args)))) An irrecoverable exception occurred. R is aborting now ... Segmentation fault (core dumped) 2021-11-20|07:32:01|INFO: Filtering 2021-11-20|07:32:01|DEBUG: Rscript /pylcmsprocessing/Rscript/filter_datamatrix.R "/output/temp_processing_db.sqlite" "/output/datamatrices/datamatrix_5dcd7b5ce452e1bbfed6423f8e4efd3f.csv" "/output/temp/temp_dm_1" 3.0 0.5 2021-11-20|07:32:02|DEBUG: stderr:Error in fread(PATH_DM, nrows = 2, sep = "\t") : File '/output/datamatrices/datamatrix_5dcd7b5ce452e1bbfed6423f8e4efd3f.csv' does not exist or is non-readable. getwd()=='/' Execution halted 2021-11-20|07:32:02|INFO: Extracting consensus MS-MS spectra 2021-11-20|07:32:02|DEBUG: Rscript /pylcmsprocessing/Rscript/fusing_msms_spectra.R "/output/temp_processing_db.sqlite" 0.1 0.1 71 "/output/fused_mgf/fused_mgf_5dcd7b5ce452e1bbfed6423f8e4efd3f.mgf" "/output/temp/temp_dm_1" "/output/temp/temp_dm_2" "/output/temp/fusing_msms.hdf5" 2021-11-20|07:32:06|DEBUG: stderr:Error in fread(PATH_DATAMATRIX, sep = "\t", header = TRUE) : File '/output/datamatrices/datamatrix_5dcd7b5ce452e1bbfed6423f8e4efd3f.csv' does not exist or is non-readable. getwd()=='/' Execution halted 2021-11-20|07:32:06|INFO: Alignment finished

adelabriere commented 2 years ago

Hi @YUANMY2021 ,

Would it be possible for you to actually paste the full log. It would be easier for me to understand the cause of your problem.

YUANMY2021 commented 2 years ago

out_singularity.txt out_docker.txt

1. My attempt at singularity on HPC 3 kinds of CSV files were generated while the merge_mgf files were not.

My attempt on docker on server None of the 3 kinds of CSV files were generated. No fused_mgf files were generated either

Both attempts on docker and singularity use the slaw: dev, although both of them didn't generate the fused_mgf while the different reasons

2.

On server docker: workflow use 14314 Mb by the core on 143 cores. And the core problem for the docker attempts was: Starting refinement 2021-11-20 07:29:42 End refinement 2021-11-20 07:30:57 Adding aligned features to the model. Starting adding aligned features to the model 2021-11-20 07:30:57 Found 605245 peaks belonging to the model and 52450 new features Ending adding aligned features to the model 2021-11-20 07:31:24 No more files to process. Saving alignment. Saving alignment.

caught segfault address 0x7f06ebc63efe, cause 'invalid permissions'

Traceback: 1: fread(path, header = TRUE, sep = ",") 2: extractSubDataMatrix.data.table(ff, subsample, subvariable) 3: buildDataMatrix.datatable(object@storage, subsample = subsample, subvariable = subvariable[all_order][sub_idx], max_sample = length(object@files), quant_var = quant_var, summary_vars = summary_vars, name_samples = names_cols, bpp = object@references@bpp) 4: exportDataMatrix(lam, path = PATH_OUT_DATAMATRIX, quant_var = VAL_INTENSITY, subvariable = which(lam@peaks$num >= 2), summary_vars = c("mz", "rt", "rt_cor", supp_args)) 5: withCallingHandlers(expr, warning = function(w) if (inherits(w, classes)) tryInvokeRestart("muffleWarning")) 6: suppressWarnings(exportDataMatrix(lam, path = PATH_OUT_DATAMATRIX, quant_var = VAL_INTENSITY, subvariable = which(lam@peaks$num >= 2), summary_vars = c("mz", "rt", "rt_cor", supp_args))) 7: withCallingHandlers(expr, message = function(c) if (inherits(c, classes)) tryInvokeRestart("muffleMessage")) 8: suppressMessages(suppressWarnings(exportDataMatrix(lam, path = PATH_OUT_DATAMATRIX, quant_var = VAL_INTENSITY, subvariable = which(lam@peaks$num >= 2), summary_vars = c("mz", "rt", "rt_cor", supp_args)))) An irrecoverable exception occurred. R is aborting now ... Segmentation fault (core dumped)

3. On HPC singularity: SINGULARITYENV_MEMORY=4000 SINGULARITYENV_NCORES=40 (For your reply have recommended me use fewer cores maybe 40, thus I choose this)

And the core problem for singularity attempts was

2021-11-21|04:49:01|INFO: Extracting consensus MS-MS spectra 2021-11-21|04:49:01|DEBUG: Rscript /pylcmsprocessing/Rscript/fusing_msms_spectra.R "/output/temp_processing_db.sqlite" 0.1 0.1 20 "/output/fused_mgf/fused_mgf_8232c1710a0ea852841a241001680a4e.mgf" "/output/temp/temp_dm_1" "/output/temp/temp_dm_2" "/output/temp/fusing_msms.hdf5" 2021-11-21|07:26:50|DEBUG: stderr:Found 2468044 features with associated MS-MS spectra Error in mcfork(detached) : unable to fork, possible reason: Cannot allocate memory Calls: bpmapply ... bploop.lapply -> .send_to -> .send_to -> -> mcfork Execution halted 2021-11-21|07:26:50|INFO: Alignment finished 2021-11-21|07:26:50|INFO: STEP: alignment TOTAL_TIME:39611.57s LAST_STEP:20381.21s

YUANMY2021 commented 2 years ago

This time I tried those two attempts above with 7 of my samples.

Whether on docker or on singularity can produce the 3 kinds of CSV files and fused_mgf files.

YUANMY2021 commented 2 years ago

Hi @YUANMY2021 ,

Would it be possible for you to actually paste the full log. It would be easier for me to understand the cause of your problem.

My reply was Here🙋‍♂️, would it be possible for me to transfer our mgf files or raw data to you through the "SCP commend"

adelabriere commented 2 years ago

Hi @YUANMY2021 thanks a lot for the feedback and info.

The fact that you were able to generate the peaktables with less ccores indicate that with too many cores, too many connections are opened by R, leading to a permission denied as they a file is opened while a connection already exist to this file. (That is my hypothesis). I know that there is a 125 connections limits, and I strongly suspect that it is responsable in a way for that, even if the error message does not indicate it. This is not a big problem as reducing the number of cores is enough to fix it.

For the memory in MS-MS I am trying several solution to avoid reading all the MS2 in memory at the same time. My current solution is very scalable in space but very unscalable in time while the current SLAW solution is the opposite. I am coding a middle ground solution. I ll come back to you when it is done.

I don t think that having the file will help to debug it as for the file that you sent me by google drive, I was able to do the whole alignment without an issue with only 16GB or RAM on both my computer and our cluster, so I can t reproduce the bug on my side. And I was able to get an OOM on MS2 so I can test with that.

YUANMY2021 commented 2 years ago

Thank you very much for your continuous efforts in helping us!

And I am also trying to use all of the QC samples + a small number of our experiment samples in every SLAW running batch

To see after annotation, if I can align and merge the result with different analysis batch

Hi @YUANMY2021 thanks a lot for the feedback and info.

The fact that you were able to generate the peaktables with less ccores indicate that with too many cores, too many connections are opened by R, leading to a permission denied as they a file is opened while a connection already exist to this file. (That is my hypothesis). I know that there is a 125 connections limits, and I strongly suspect that it is responsable in a way for that, even if the error message does not indicate it. This is not a big problem as reducing the number of cores is enough to fix it.

For the memory in MS-MS I am trying several solution to avoid reading all the MS2 in memory at the same time. My current solution is very scalable in space but very unscalable in time while the current SLAW solution is the opposite. I am coding a middle ground solution. I ll come back to you when it is done.

I don t think that having the file will help to debug it as for the file that you sent me by google drive, I was able to do the whole alignment without an issue with only 16GB or RAM on both my computer and our cluster, so I can t reproduce the bug on my side. And I was able to get an OOM on MS2 so I can test with that.

Thank you very much for your continuous efforts in helping us!

And I am also trying to use all of the QC samples + a small number of our experiment samples in every SLAW running batch

To see after annotation, if I can align and merge the result with different analysis batch for I tried that fewer samples can produce results

adelabriere commented 2 years ago

hi @YUANMY2021 thanks for your patience. I was able to come with a middle ground solution which reduce drastically memory consumption of the MS-MS merging step (85% decrease). Could you please retry your processing with adelabriere/slaw:msms. I tested the singularity container on my problematic dataset and it finished correctly. Could you please try on yours ?

YUANMY2021 commented 2 years ago

Really Excited to hear this news!

I will try it right now!

And thanks for your persistent working!

-----原始邮件----- 发件人:"Alexis Delabriere" @.> 发送时间:2021-11-29 19:56:46 (星期一) 收件人: zamboni-lab/SLAW @.> 抄送: YUANMY2021 @.>, Mention @.> 主题: Re: [zamboni-lab/SLAW] caught segfault address 0x7f06ebc63efe, cause 'invalid permissions' (Issue #7)

hi @YUANMY2021 thanks for your patience. I was able to come with a middle ground solution which reduce drastically memory consumption of the MS-MS merging step (85% decrease). Could you please retry your processing with adelabriere/slaw:msms. I tested the singularity container on my problematic dataset anbd it finished correctly. Could you please try on yours ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

YUANMY2021 commented 2 years ago

hi @YUANMY2021 thanks for your patience. I was able to come with a middle ground solution which reduce drastically memory consumption of the MS-MS merging step (85% decrease). Could you please retry your processing with adelabriere/slaw:msms. I tested the singularity container on my problematic dataset and it finished correctly. Could you please try on yours ?

1. For my attempt on docker: with 100+ cores and generate the (3 kinds of csv files and fused_mgf file),with only 196 features (1)

image

2. For my attempt on singularity: with 64 cores and generate the (3 kinds of csv files and fused_mgf file),with only 5000 + features

image

3. While both of these 2 attempt generated less total features than my previous results (18000+ features)(Although at that time only csv files can be generated)

image

So like you told me before with less core can generated more features, I will try 35 cores to try this !

adelabriere commented 2 years ago

I strongly suggest that to stick to less cores.the 196 feature probably means that the processing crashed at some point which is expected with too many cores. For the 5000 don t hesitate to reduce with the 'fraq_qc' parameters or pick another algorithm. At least it is finishing correctly now so I am closing this issue.

YUANMY2021 commented 2 years ago

Thanks for continuously working out for this problem.

Although I didn't have enough knowledge of the coding. I

I strongly suggest that to stick to less cores.the 196 feature probably means that the processing crashed at some point which is expected with too many cores. For the 5000 don t hesitate to reduce with the 'fraq_qc' parameters or pick another algorithm. At least it is finishing correctly now so I am closing this issue.

Really thank you for your continuous efforts!❤❤ I will follow your suggestions!