sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
185 stars 80 forks source link

the bandwidth of grouping is limited by n in density #161

Closed stanstrup closed 6 years ago

stanstrup commented 7 years ago

When the density is calculated it is using the default number of points in the returned density. That means (I think) that the effective bw can never be more than the rtrange divided by the number of points that is by default 512. If you have a high peak capacity this can make perfectly distinct peaks inseparable.

Here is an example. I have a 34 minutes run time GCxMS dataset. The effective bw is something like (34*60)/512 = 4 sec between datapoints I guess.

Grouping the data I cannot separate these two peaks no matter the bw:

image

There is ~5 sec between them but as you see they are not separated (no matter the chosen bw). I am not sure what n should be but perhaps: 2^(ceiling(log2(diff(rtrange)/bw)))

EDIT: perhaps max(512, 2^(ceiling(log2(diff(rtrange)/bw)))) would be better.

That worked for me at least: image

I made a pull request on the biocdevel branch (I couldn't get devel to play nice. Used a lot of memory and complained that minFraction was not between 0 and 1 in retcor).

https://github.com/sneumann/xcms/pull/160

Any idea if n should be some "multiple of the bw" to ensure the selected bw is possible?

jorainer commented 7 years ago

I made a pull request on the biocdevel branch (I couldn't get devel to play nice. Used a lot of memory and complained that minFraction was not between 0 and 1 in retcor).

That is very strange. biocdevel and devel branches should be/are the same.

jorainer commented 7 years ago

For the rest, it sounds reasonable to me, especially the max(...). @sneumann what do you think?

jorainer commented 7 years ago

Actually, I checked on some artificial data and I think @stanstrup is right. We could even increase the n further (e.g. by multiplying with 2), since, as far as I see it, this does not affect the overall shape of the density curve, but makes it more detailed (e.g. being able to separate neighboring peaks if bw is low).

The changes should however be made in the do_chromPeaks_density function.

In addition I would be not like to include it in the upcoming Bioc release as we would need some more testing on it.

jorainer commented 7 years ago

@stanstrup: I've merged your pull request, but removed the definition of the n in the plotrt, and the plotting functionality of the retcor.peakgroups and retcor.peakgroups_orig. These were unrelated to the grouping functions and did also throw an error.

You can now also use the plotChromPeakDensity function (currently in xcms3 branch) to simulate a peak grouping in a manually selected mz range using different settings - might be interesting to see if you can now discriminate your peaks.

stanstrup commented 6 years ago

I guess this can be closed.