jjhelmus / nmrglue

A module for working with NMR data in Python
BSD 3-Clause "New" or "Revised" License
211 stars 86 forks source link

Bruker ppm scale slightly shifted #54

Open kangaroocry opened 7 years ago

kangaroocry commented 7 years ago

I noticed that when I process the same data the same way with Bruker TopSpin and with nmrglue the frequency scale is slightly shifted (by 1.5 points). I understand the reason for that and could easily fix this in the code. But there is the issue whether to fix only add_axis_to_udic in bruker or to fix also the unit_conversion class since this is connected to the issue of how the carrier frequency car is defined. Let's consider even sized data first. The way nmrglue treats car is to define it as the position of the zero-frequency component (the first point) of the FFT output. The way Bruker parameters are used to create a unit_conversion object implies that the difference SFO1-SF is the same as car. But in contrast to nmrglue, Bruker TopSpin defines it not as the position of the first FFT point but as the lower edge of the spectral window which is shifted by half a frequency step lower. On top of that, the way TopSpin and numpy shift FFT arrays is different since TopSpin shifts the first point of the FFT output to the index size/2-1 while np.fft.fftshift shifts it to the index size/2. This leads to the fact that the frequency scale by itsself is shifted by 0.5 frequency steps and (because the shifted FFT data are displaced by 1 index) the frequency value for same FFT data point (e.g. a peak) is shifted by 1.5 frequency steps between TopSpin and nmrglue. Can anyone else confirm this issue and the explanation? And can anyone answer whether data from other software packages (e.g. NMRPipe) have the same issue? It would help to decide how is the best way to fix it.

kfritzsc commented 7 years ago

I have also seen this. In typical liquid state NMR datasets, the problem is small. In solid-state NMR datasets, it is a bigger problem. My temporary solution has been to zero fill extensively (i.e. make 1.5 delta very small).

To test the problem, I would change TD and see how the peak shifts. If there is a problem, the peak will get closer to the correct position monotonically.

Is there a Varian/Agilent data set we can test this on? It would be helpful if the data set was referenced well. The problem will be most apparent in data sampled with a large dwell time and a short total acquisition time.

kangaroocry commented 7 years ago

I tried to process the same data with MestReNova and intrestingly, it shows a similar problem. Here the scale is off only half a frequency step but in the other direction. And MestReNova follows the TopSpin convention of shifting the FFT data (at least for the Bruker data I imported).

kangaroocry commented 7 years ago

Indeed, in TopSpin a peak position slightly shifts while varying TD, which is logic when you consider how it processes the data (see my first comment). Choosing TD very big the peak position converges to the position that nmrglue shows. Since nmrglue's reference is the first point of the FFT output (which is the zero-frequency component that is insensitive to the data size) it doesn't show this problem. Now it becomes very tricky to fix this issue in general since it depends on whether you referenced your data in topspin with a high or low TD. At least if you do it by hand with the help of a reference peak. Any suggestions?

jjhelmus commented 7 years ago

I don't work with Bruker data so I can't really offer much help here. Would it be possible that the way in which the digital filter is removed could effect the PPM scale?

kangaroocry commented 7 years ago

I also let TopSpin remove the digital filter and it was the same there. As I see it, the peak positions being sensitive to the array size is a TopSpin problem only due to its data processing as I descibed it above. But it could still be that imported data show a principal offset of one point in nmrglue due to the different fftshift algorithms between numpy and TopSpin. I will investigate this later and give report about it.

jjhelmus commented 7 years ago

Does NMRPipe give a different PPM scale with this type of data. When I was writing much of the unit conversion and PPM scale code I used NMRPipe to check the results very often.

kangaroocry commented 7 years ago

Thanks for referencing #53 into this. I try to summarize a bit. We have three (or four) issues concerning the ppm scale now:

  1. nmrglue's unit conversion gives slightly shifted ppm scales for odd array sizes
  2. Bruker TopSpin ppm scale is slightly dependent on the data array size
  3. Bruker TopSpin's ppm scale is generally slightly shiftet towards nmrglue's ppm scale
  4. (behaviour of other software is not so clear up to now)

First of all, these problems are rather small especially for large array sizes.

Comments on these issues:

  1. I tested this a bit further and it seems to be as I said. For odd array sizes the scale is 0.5 point distances off. You can easily fix it with little changes in the code, I will create a PR soon and show example plots.
  2. This is really an internal problem of TopSpin and has nothing directly to do with nmrglue but with a non-consistent scaling algorithm in TopSpin. It is no problem as long as you plot the data within TopSpin with the same array size as you used to perform the calibration. It affects nmrglue only through the fact, that a scale calibration that was performed in TopSpin has an offset towards a 'correct' (consistent) scale processing, as so in nmrglue. The higher the array size is, the smaller gets the problem, because the offset is always 0.5 point distances. Unfortunately, there is not really a save way to fix this. You can back-correct the offset if you know at which array size the calibration in TopSpin was performed, but that you can never know for sure, I guess. You could read out the SI paramter (which is the data size for processing) but you never know wether the calibration was maybe performed with another SI value before. As this value is usally very high, I would suggest to ignore this problem instead of introducing sophisticated back-correct algorithms.
  3. This has partly to do with (2.) but additionally with the fact, that the shift algorithm to center the zero frequency component of the FFT is slightly off (1 point) between nmrglue (i.e. numpy) and TopSpin. There is no bug in neither of both but they use simply different definition of this shifting. (Namely whether the Nyquist frequency is definded as positive or negative.) Also here I will create a PR and try to reason it with examples. As soon as I find time for it.
jjhelmus commented 7 years ago

I can not comment on how Topspin calculates its PPM scale or if the method used is correct as I do not use that software.

I tried comparing the PPM limits that nmrglue and NMRPipe find for a odd sized arrays with the following script:

from __future__ import print_function
import os

import numpy as np
import nmrglue as ng

# create an odd sized time domain NMR spectrum
data = np.arange(201, dtype='complex64')
udic = {
    'ndim': 1,
    0: {
        'complex': True,
        'freq': False,
        'time': True,
        'encoding': 'direct',
        'label': '1H',
        'obs': 100.0,
        'car': 2000.000,
        'size': 201,
        'sw': 50000.0,
        },
}
dic = ng.pipe.create_dic(udic)
ng.pipe.write('odd.fid', dic, data, overwrite=True)

# Use NMRPipe to perform a FFT
nmrpipe_command = (
    'csh -c '
    '"nmrPipe -in ./odd.fid  '
    '| nmrPipe -fn FT '
    '-verb -ov -out ./odd.ft"'
)
os.system(nmrpipe_command)

# read in the FFT spectra and find the PPM scale limits
ft_dic, ft_data = ng.pipe.read('./odd.ft')
uc = ng.pipe.make_uc(ft_dic, ft_data, 0)
ppm_min, ppm_max = uc.ppm_limits()
print('ft_data.shape', ft_data.shape)
print("nmrglue PPM limits %.3f %.3f" % (ppm_min, ppm_max))

This gives limits of 268.756 -228.756

The limits given by the showhdr command in NMRPipe are identical:

% showhdr -verb odd.ft
FILE: odd.ft DIM: 1 QUAD: Complex 2DMODE: None
BYTES: 3656 PRED: 3656 MIN: -100.501 MAX: 20100 VALID: 1
ORDER: 2 PIPE: 0 PLANES: 1 201x1x2 Not Transposed
CONVERTED ON: 10/5/2016 10:22 AM

               X-Axis   

DATA SIZE:           201
APOD SIZE:           201
SW Hz:      50000.000000
SW PPM:          500.000
Hz/POINT:        248.756
AQTIME SEC:     0.004020
OBS MHz:      100.000000
ORIG Hz:    -22875.621094
DOMAIN:             Freq
...
PPM FIRST:       268.756
PPM LAST:       -228.756
PPM OFFSET:        0.000
PPM CAR:          20.000
CENTER:              101

So at least in this case I am not seeing an issue with nmrglue's unit conversion.

@kangaroocry can you provide some additional details or an example file where you see a difference?

kangaroocry commented 7 years ago

Sorry for the delay. Regarding the shift for odd array sizes, I see this problem only in nmrglue itself by using fixed spectral parameters but different array sizes. I cannot tell something about TopSpin because it doesn't allow odd sizes. As it appears from your test, NMRPipe uses the same processing as nmrglue. You can notice the effect by creating several unit conversion objects with the same FID and spectral parameters but with different FID sizes. (Adding zeros or cutting points.) Here is an example:

import nmrglue as ng
import matplotlib.pyplot as plt
import numpy as np

t = np.linspace(0,1.0,16)
N = t.size
dt = (t[-1]-t[0])/(N-1)

ph0 = +0.0
ph1 = +0.0

POSN = 0.0 #peak position in bins relative to the center
POS = POSN/N #peak position in units of spectral width [-0.5...0.5] relative to the center
FWHMN = 2 #FWHM in bins
FWHM = FWHMN/N #FWHM in units of spectral width
MAX = 1. #peak maximum value for the single peak
g = 0.0 #gaussian character 0...1

#constants for later calculations
c1 = 1.0/(4*np.log(2.0))
c2 = 1.0 - 1.0/(np.sqrt(np.pi*np.log(2.0)))

w = POS/dt*2.0*np.pi #real angular frequency
l = FWHM/dt*np.pi #exponential decay rate (lorentzian line with)
a = MAX*dt #peak intensity
s = l**2 * c1 #gaussian width = 1/sqrt(2*s) calculated to have the same FWHM as the lorentzian line shape (for g=0)
w_ = 1.0j*w - (1.0-g)*l - g*s*(t+ph1*dt) #full complex frequency (oscillation + exponential and gaussian decay)
a_re = (1.0 - c2*g)*l #renormalization to keep the isolated peak maximum constant while varying l and g

fid = a*a_re*np.exp(w_*(t+ph1*dt)+ph0*2j*np.pi)

fid1 = np.copy(fid)
fid1[0] = 0.5*fid1[0]+0.5*fid1[-1] #baseline correction
spc1 = ng.process.proc_base.fft(fid1)

fid2 = np.copy(fid[:-1])
fid2[0] = 0.5*fid2[0]+0.5*fid2[-1] #baseline correction
spc2 = ng.process.proc_base.fft(fid2)

fid3 = np.copy(fid[:-2])
fid3[0] = 0.5*fid3[0]+0.5*fid3[-1] #baseline correction
spc3 = ng.process.proc_base.fft(fid3)

fid4 = np.copy(fid[:-3])
fid4[0] = 0.5*fid4[0]+0.5*fid4[-1] #baseline correction
spc4 = ng.process.proc_base.fft(fid4)

sw = 1.
obs = 1.
car = 0.

uc1 = ng.fileiobase.unit_conversion(fid1.size,True,sw,obs,car)
uc2 = ng.fileiobase.unit_conversion(fid2.size,True,sw,obs,car)
uc3 = ng.fileiobase.unit_conversion(fid3.size,True,sw,obs,car)
uc4 = ng.fileiobase.unit_conversion(fid4.size,True,sw,obs,car)

frq1 = uc1.ppm_scale()
frq2 = uc2.ppm_scale()
frq3 = uc3.ppm_scale()
frq4 = uc4.ppm_scale()

fig = plt.figure(figsize=(20,10))
ax1 = fig.add_subplot(111)
ax1.plot(frq1,np.zeros(t.size),"k--")
ax1.set_xlabel('Frequency')

ax1.plot(frq1,spc1.real,"c-",label='{}'.format(spc1.size))
ax1.plot(frq2,spc2.real,"m-",label='{}'.format(spc2.size))
ax1.plot(frq3,spc3.real,"b-",label='{}'.format(spc3.size))
ax1.plot(frq4,spc4.real,"r-",label='{}'.format(spc4.size))

ax1.set_xlim(frq1[0],frq1[-1])

plt.legend()
plt.show()

Which will give the following output: nmrglue with offset

I choosed a very low array size to make the effect visible. The offset between odd and even sized disappears by modifying the folowing line in the unit_conversion class self._first = self._car / self._obs - self._delta * self._size / 2. to self._first = self._car / self._obs - self._delta * (self._size // 2)

resulting in the following output: nmrglue with offset

Again: This offset issue is caused by the algorithm of np.fft.fftshift that is not fully taken into account by the original calculation of self._first. There is only a float division by 2.0 of integer array sizes which, in turn, cause an offset of 0.5 point distances for odd sizes.

But in the end, if NMRPipe is processing it in the same way and your aim is to be consistent with NMRPipe, than you should not change anything.

eisoab commented 7 years ago

Not sure if this is already resolved, but I've been bitten by this too in topspin .

Old versions of topspin would set the carrier ppm freq halfway between the two middle points. IMHO this is plain incorrect since with this definition peaks will shift thier ppm value by half a channel number (point) if you double the zerofilling. Not a problem if you have 16k points points but with with 128 points in a 3D it is significant

If non oscillation frequencies at the carrier end up at point n/2+1, peaks stay at the same position as they should and the first (left) point of the spectrum is at exactly the left edge of the spectrum. On the right side it comes one point short, but after FT this should be the same point as point 0.

I guess you knew all this , however, as far as I know, Bruker has corrected this behavior in recent versions of topspin (last 2-3years or so), but must confess I haven't checked carefully whether it is ok now.