Closed kfritzsc closed 8 years ago
Hi, Simple sum is a routine method of doing "integration". For example, NMRPipe measures peak volumes in this way, Sparky also provides "Sum over box" and "Sum over eclipse" modes although they are not default options. I think nmrglue just does the same thing.
It seems to me that your integration method normalizes peak intensities to 1.0, which will help quantitatively comparison between spectra collected/processed differently. Is my understanding of this correct? Will you correct/scale your integrated areas/volumes by the peak heights (that are not affected significantly by zero-filling or binning)?
Thanks.
I agree that the simple sum method is common and works within a single spectrum.
I propose no re-normalization. I just purpose using the definition of an integral.
If integration is done properly spectra with different bin widths (as a result of different zero-filling or dwell times, ect.) can be compared quantitatively. I do not purpose to do anything with the max intensity in the range (i.e. peak hight).
I would get the same, correct, result by printing the spectrum on paper, cutting out the peaks with scissors and then weighing the peaks with an analytical balance.
I think it is a good point. For me I would prefer the unit_conv simply to be the ppm scale for the data. I realize this It is a bit hacky to produce this scale at the moment, especially for bruker data, but that is a separate problem
Thank you for the support! It would be pretty simple to write a function that takes either the unit_converter or the ppm scale as an array, one try / except AttributeError block would do.
I hope PR #43 fixes some of the ppm scale problems for Bruker data.
@kfritzsc Very interesting discussion, I agree that some additional functions on this topic could be added to nmrglue.
What is and isn't "integration" very quickly descends into semantics which I do not wish to spend to much time arguing one way or the other. More practically, finding the intensity of a peak by summing over a box or rectangular region is often used in NMR to analyse a set of spectra collected with identical parameters, for example in the fitting t1 data example. In such cases, the underlying spacing need not be considered. From email I get about nmrglue and the description of how the package is used in journal articles, these "integration" examples seem to be exactly what many scientists are looking for and are quite often used with only minor modifications.
As for the more specific questions:
1) What is 'process.proc_base.integ' being used for? Now it is just an aliases for np.cumsum(data, axis=-1).
This performs "intergration" in the same manner as NMRPipe's INTEG function, by cumulative sum. The units of the integral are points (where dx is 1) which are used in nearly all the function in proc_base. This could be more explicitly explained in the docstring. Using this for analysis is probably not best, but this type of function is sometimes used to process data.
2) I think there should be a function to do integration in nmrglue. It is something that most users will need. At minimum I think the function needs x (or some proxy), and y data and two x limits. Something like this would work...
Agreed, having this function somewhere in the analysis folder is probably best. You may want to think about including support for units other than PPM such as hertz and seconds as well as finding a way to support multi-dimensional integration. The unit_conversion class might be helpful here as it is designed to convert various NMR units to an index which can be used to slice the data:
>> >uc = ng.pipe.make_uc(dic, data)
>>> print(uc("100 ppm"))
411
>>> uc("650 Hz")
768
>>> print(uc("50%"))
750
>>> print(uc("5000 us") # time units are not too helpful in the frequency domain but can be used...
250
>>> print(uc("5 ms"))
250
Closed by #45
The two examples showing how to do integration in nmrglue don't actually do integration. They simply sum the data around the peak. The dx part is important...
In 1D spectrum (with equally spaced bins) if you sum sections of the data and compare those values the result is very 'integration like' and correct. The idea however is still incorrect. Practically speaking, this approach is problimatic If you start comparing data between multiple data sets (think zero-filing problems). It also makes error analysis more difficult. To fix the problem you simply have to actually integrate: multiply the sum with the bin width:
Example
Output. ('Sum: ', 5.0000000000000178, 10.000000000000036) ('Integral: ', 1.0000000000000036, 1.0000000000000036)
I would have just fixed the examples and submitted a pull request but wanted to consider two things:
1) What is 'process.proc_base.integ' being used for? Now it is just an aliases for np.cumsum(data, axis=-1). 2) I think there should be a function to do integration in nmrglue. It is something that most users will need. At minimum I think the function needs x (or some proxy), and y data and two x limits. Something like this would work...
Ideas?