Closed Titan100 closed 7 years ago
Hello Titan!
So the output above is the last output you saw before cancelling the calculations? How long did it take to get a result. Or in the case of cancelling before getting a result: How long did you wait before cancelling?
In general the bottleneck in time is given by xcms itself. In your case specifically the call of xcmsSet probably needs some time. Your code is starts an optimization of three parameters, which makes IPO run xcmsSet 17 times for a single DoE. Those 16+1 times are result of an efficient optimization approach. You started a calcuation with 4 IPO-clusters, thus the 16 xcmsSet cacluations run in parallel.
Do you know how long a single xcmsSet call needs with your data on your computer? That would be interesting.
The first thing you can do is increasing the number of xcms-slaves by setting peakpickingParameters$nSlaves
, which should make the single xcmsSet calls faster. Of course you can also increase the number of IPO-slaves. Please be reminded that the number of needed cores on your computer is the number of IPO-slaves multiplied by the number of xcms-slaves.
hth Thomas
Hey Rietho, I did not have idea about how long should I wait so I just terminated after few hours. I tried in my laptop the other day but it did not show me result even after running overnight. However I got result for other data (4 mzXML files) few weeks back. That took longer time too. I tried running setting nSlave as according your suggestion. Still it is taking long. It has already been about 4 hours now. I have a couple of questions: 1) Can I use few mzXML files out of 50 files and use IPO for parameter optimization? I was wondering if the parameter optimized for few files should represent for the other data files too (the data file being acquired at the same time using same machine and same sample). 2) Is that normal or I am missing something to get the work done faster.
Hi,
I was just following this thread and I'm just wondering if it's interesting to launch
If my comment is completely trivial, accept my apology. I'm a really new "user" of IPO.
@Titan100
For finding out more about the running time of xcmsSet you can try to run set nSlaves
within optimizeXcmsSet
as well as peakpickingParameters$nSlaves
to 1. This settings will let xcms print information about each run. You should see output like the following
Detecting mass traces at 20 ppm ...
% finished: 0 10 20 30 40 50 60 70 80 90 100
7068 m/z ROI's.
Detecting chromatographic peaks ...
% finished: 0 10 20 30 40 50 60 70 80 90 100
3644 Peaks.
for each peak detection for each file. The numbers after ' % finished' show up as the calculation runs. When running IPO there will be as well lines with single numbers. These numbers indicate the number of xcmsSet-calls within a DoE, thus go up to 16 in your case.
@lecorguille Thank you for your input. Inputs and new ideas are always welcome :smiley:
I'm not sure if I understood your comment correctly. Nevertheless I try to respond:
As pointed out IPO is intended to be used on a training set itself. The published paper by my colleagues (see http://www.biomedcentral.com/1471-2105/16/118) studied how well the IPO results for the training data worked out for the whole data set.
The choice of parameters like min_peakwidth
and max_peakwidth
is set by the default values to a standard range. The problem with a too large range is that IPO would not be capable of giving a reasonable estimation for the whole range. IPO is using DoE (design of experiments) methods to estimate the range. Actually the central-composite design is used which results in testing for each parameter the outer limits as well as the middle point. Thus a too large range would be misleading.
Thank you all.
Hello @lecorguille , @rietho , @sneumann , @glibiseller
Detecting mass traces at 1 ppm ... % finished: 0 10 20 30 40 50 60 70 80 90 100 Warning: There were 1065223 peak data insertion problems. Please try lowering the "ppm" parameter.
333603 m/z ROI's.
Detecting chromatographic peaks ... % finished: 0 10 20 30 40 50 60 70 80 90 100 97687 Peaks.
Detecting mass traces at 1 ppm ... % finished: 0 10 20 30 40 50 60 70 80 90 100 Warning: There were 1065223 peak data insertion problems. Please try lowering the "ppm" parameter.
My questions are: 1)Are these warnings normal? 2) As suggested, I decreased ppm from 5 through 1 and still there are warnings. 3) Does that mean I need to set my ppm below 1? 4) In some, I get ROIs mentioned (For example: 333603 m/z ROI's) but not is other (For example: 97687 Peaks) (See example on above box)? Is that a serious issue I should consider?
Thank you for your answer.
This large number of "peak insertion problems" usually indicates that you have profile mode data, and in that case modifying the ppm parameter won't help. The centWave algorithms depends on MS raw data bein centroided. You can achieve that oftentime at the conversion step to e.g. mzML in proteowizard msconvert. This will also reduce file sizes and runtimes. Yours, Steffen
Hey Steffen, I am running IPO on centroided data. As suggested elsewhere, I converted profile (.raw) to centroid (.mzML) mode using msconvert. While converting to centroid mode, I chose "peak picking" parameter and converted to .mzML format.
Here is the screenshot of the parameters I used for file conversion using msconvert.
click on the add below the filters, to actually apply the peak picking filter.
Thanks. I just corrected. Hope this should work.
Detecting mass traces at 4 ppm ... % finished: 0 10 20 30 40 50 60 70 80 90 100 Warning: There were 3548 peak data insertion problems. Please try lowering the "ppm" parameter.
40953 m/z ROI's.
Detecting chromatographic peaks ... % finished: 0 10 20 30 40 50 60 70 80 90 100 13082 Peaks.
Detecting mass traces at 4 ppm ... % finished: 0 10 20 30 40 50 60 70 80 90 100 Warning: There were 4354 peak data insertion problems. Please try lowering the "ppm" parameter.
39225 m/z ROI's.
Detecting chromatographic peaks ... % finished: 0 10 20 30 40 50 60 70 80 90 100
So which MS instrument are you using ? Could you share one mzML file from that setup ? Doesn't have to be a real Lipidomics one, standards or even rinse would be fine. Bonus points if it is small (<100MB) Yours, Steffen
Hey @sneumann I have shared two files (profile and centroid) through dropbox. I used Thermo HF Orbi for acquiring data. Please see you email.
Can I ask for the mzML instead of the *.raw please? Thanks Steffen
I blame Android for the brevity and typos
---- Titan100 schrieb ----
I have shared two files (profile and centroid) through dropbox. I used Thermo HF Orbi for acquiring data. Please see you email.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHubhttps://github.com/glibiseller/IPO/issues/39#issuecomment-220616207
@sneumann Shared.
Hi,
if you
library(xcms)
xr <- xcmsRaw("centroid.mzML")
plotRaw(xr, log=TRUE)
plotRaw(xr, mzrange=c(805, 810), rtrange=c(300,320), log=TRUE)
you see that /some/ of the mass traces have very close by "satelites".
These peak pairs make up the insertion problems. They are also visible in the profile mode. Can you find out if that is indeed a different lipid with a very similar mz, and not just some artefact ?
xrprofile <- xcmsRaw("/home/sneumann/Downloads/profile.mzML")
plotRaw(xrprofile, mzrange=c(805, 810), rtrange=c(300,320), log=TRUE)
plotScan(xrprofile, 550, mzrange=c(807,808))
You need to check yourself how bad this affects the peak picking. For this you could overlay the picked peaks over the raw image: (disclaimer: parameters not optimised!):
p <- findPeaks(xr, method="centWave", ppm=5)
plotRaw(xr, mzrange=c(800, 850), rtrange=c(300,350), log=TRUE)
points(p@.Data[,c("rt","mz")])
It would be great if you could report back your findings.
Yours, Steffen
@Titan100 any updates?
I'll close this issue, as there are no updates for several months. If there are any news, you're welcome to reopen the issue.
Hello @sneumann @rietho @glibiseller , I am trying to optimize xcms parameters using IPO. 18.5 GB .mzXML files. It is taking forever for me to get result. I have i7-4790 CPU @ 3.60GHz, 32 GB RAM and 64 bit operating system.
Is there anyway to get it done faster? Below is the command that I used to run the program.
starting new DoE with: min_peakwidth: c(10, 20) max_peakwidth: c(35, 65) ppm: 5 mzdiff: c(-0.001, 0.01) snthresh: 10 noise: 0 prefilter: 3 value_of_prefilter: 100 mzCenterFun: wMean integrate: 1 fitgauss: FALSE verbose.columns: FALSE nSlaves: 1
Using PSOCK type cluster, this increases memory requirements. Reduce number of slaves if your have out of memory errors.
Exporting variables to cluster...
Thank you for your help.
Titan