First off, great tool to help a newcomer get started with using XCMS, thanks!
Now, I understand that IPO is not actively maintained, but I wanted to leave this here for others, and if/when someone takes up development again.
The underlying issue is that the grouping optimization seeks to optimize norm(GS)+norm(RCS). In my case, this leads the algorithm to seek out a minimum dominated by RCS, while loosing out on >95% of the "good groups" available (see below). I know that optimizing for two parameters (GS and RCS) will probably always encounter such issues in certain edge cases, which my data set appears to be. However, a re-definition of the overall score may still help to reduce these edge cases. For my scenario, the following adaptations helped:
score = GS/max(GS) + (1-ARTS/max(ARTS))
This weighs tiny improvements in ARTS less heavily than the hyperbolic RCS. A definition along those lines also avoids calculating the norm only on the current iteration, which contributes to the algorithm running itself into a corner.
Below, the edited results for a bandwidth optimization (second iteration). Note that IPO selects the first parameter set with GS 0.434635 (!). I've seen this happen with two of my datasets pretty independently of initialization parameters.
Hi @tew42 and sorry for the very late response. As you correctly pointed out, IPO is not actively maintained at the moment. However, I highly appreciate sharing your insights.
First off, great tool to help a newcomer get started with using XCMS, thanks!
Now, I understand that IPO is not actively maintained, but I wanted to leave this here for others, and if/when someone takes up development again.
The underlying issue is that the grouping optimization seeks to optimize norm(GS)+norm(RCS). In my case, this leads the algorithm to seek out a minimum dominated by RCS, while loosing out on >95% of the "good groups" available (see below). I know that optimizing for two parameters (GS and RCS) will probably always encounter such issues in certain edge cases, which my data set appears to be. However, a re-definition of the overall score may still help to reduce these edge cases. For my scenario, the following adaptations helped:
This weighs tiny improvements in ARTS less heavily than the hyperbolic RCS. A definition along those lines also avoids calculating the norm only on the current iteration, which contributes to the algorithm running itself into a corner.
Below, the edited results for a bandwidth optimization (second iteration). Note that IPO selects the first parameter set with GS 0.434635 (!). I've seen this happen with two of my datasets pretty independently of initialization parameters.