rietho / IPO

A Tool for automated Optimization of XCMS Parameters
http://bioconductor.org/packages/IPO/
Other
34 stars 20 forks source link

Too much time for optimization!! #72

Open arpita-007 opened 1 year ago

arpita-007 commented 1 year ago

Hello,

I am facing this issue with the IPO where it is taking a lot of time to complete the iteration. The files which we are using are acquired on Thermo Orbitrap Fusion coupled with Dionex Ultimate 3000 RS. Six QC files, each of size 110,603 KB approximately, were loaded for parameter optimization. Usually, it takes 24 hours for optimizing the peak picking parameters, but here it is taking a lot of time (around 3 days), and still, it keeps on running.

We are not able to figure out what the problem is. Any help is very much appreciated.

Thank you

rietho commented 1 year ago

Hi @arpita-007! I want to acknowledge your issue, even though I can't help. The package itself is currently unsupported. I am only providing emergency updates.

I suggest engaging with the community to see if you can get help there.

song-sbio commented 1 year ago

Hi @arpita-007, you managed to solve it? I'm experiencing the same issue. Thanks!

arpita-007 commented 1 year ago

Hi @song-sbio No, we could not solve it. I think it's the issue with the file numbers. More the files, longer the time it takes. Because we recently used the IPO again with a lesser no. of files and it worked.

song-sbio commented 1 year ago

Hi @arpita-007 . Thank you for the explanation.

linlennypinawa commented 1 year ago

It took me about 2 days to complete the iteration for eight files.

Here is the range of each variables I set:

min_peakwidth <- c(5, 15)
max_peakwidth <- c(50,90)
ppm <- c(3, 12)
mzdiff <- c(-0.002, 0.002)
snthresh <- c(1, 10)
noise <- c(100, 1000)
prefilter <- c(1,4)
value_of_prefilter <- c(100, 1000)

The outcome is this:

ppm = 12
peakwidth = c(12, 82)
snthresh = 10
integrate = 1
noise = 361
prefilter = c(3,1090)
mzdiff = 0

I consulted my friend who is specialized in DOE. Here is his response:

From reading paper it seems that it uses a box behnken design, which is strange choice. Not very modern and they are quite big when you have as many as 8 factors. It also seems that in each iteration of the optimisation cycle, they start a new design. And it sounds like the data from the previous iteration is discarded. This is also strange and seems very wasteful. I might be wrong, of course. Maybe I read it wrong. I like the overall idea of the approach. But I think the DOE execution aspect could be improved.

Then he suggested this:

My approach would be to use a definitive screening design to find out the factors that have an important effect and then work sequentially from there.

rietho commented 1 year ago

Hi @linlennypinawa.

I do agree that the DoE approach implemented in IPO has the potential to be improved. I personally am only maintaining the package to the degree to keep it functional. But I do hope that my former employer Joanneum Research will find time and resources to take on active maintenance along with further development.