jjhelmus / nmrglue

A module for working with NMR data in Python
BSD 3-Clause "New" or "Revised" License
211 stars 86 forks source link

Discrepancies between make_uc functions in pipe.py and pipe_proc.py #33

Closed ahmohamed closed 9 years ago

ahmohamed commented 9 years ago

Hello, Currently ng.pipe.make_uc() always centers the carrier, while ng.pipe_proc.make_uc() never does. Is there a specific reason for this?

Also, since carrier centering is needed only when nmrPipe EXT function is performed, centering should be performed only in that case. Maybe by adding:

if dic[fn + "X1"]  != 0. or dic[fn + "XN"]  != 0.:
    car = car + sw / size * (dic[fn + "CENTER"] - 1. - size / 2.)

In pipe.py also, I noticed that guess_udic function swaps the sizes of the dimensions in 2D spectra if the FDDIMORDER is not in ascending order. Looking at the code, it seems that changing udic[i]["size"] = data.shape[i] to udic[i]["size"] = data.shape[(data.ndim - 1) - i] solves it. But I don't fully understand this part of the code.

Thanks as always. Ahmed.

jjhelmus commented 9 years ago

@ahmohamed Thank you for the feedback.

The different between ng.pipe.make_uc() and ng.pipe_proc.make_uc() is unintentional, the version in the pipe.py module is correct with the centering of the carrier which was added in commit a72d38995c376d. The make_uc function in the _pipeproc.py module should be updated with this change or removed as it is duplicated code which is not used in the module or elsewhere.

You are correct the the carrier centering is only needed when the NMRPipe EXT function is performed but the operation is still correct even when the function has not been applied. Rather than attempting to check if an NMRPipe extraction has been applied which adds another branch and hence another path to test, I think always applying the centering (the current behavior) is simpler and as far as I have seen does not give incorrect carrier frequencies. If you have an example where the carrier is incorrect please let me know.

The guess_udic function returns the dictionary dimensions are keys to match the dimensions in the data array not the original file. As such the correct method for determining the size is to use udic[i]["size"].shape[i] Can you provide an example where you believe this is incorrect?

ahmohamed commented 9 years ago

@jjhelmus Thanks for getting back to me. I encountered both issues when I was trying to replicate NMRPipe processing of a 2D spectrum (HSQC, http://bmrb.wisc.edu/metabolomics/mol_summary/show_data.php?molName=amygdalin&id=bmse000139&whichTab=1) If you download the data, you will find the .fid time-domain and .ft2 frequency domain NMRPipe files. Testing both issues can be done as follows:

import nmrglue as ng # 0.6-dev
dic, data = ng.pipe.read('bmse000139_HSQC/bmse000139_hsqc.fid')

print 'with carrier centering:'
print ng.pipe.make_uc(dic, data, 1).ppm_limits()  #(11.279814017369414, -0.7228082594735792)

print 'without carrier centering:'
print ng.pipe_proc.make_uc(dic, data, 1).ppm_limits() #(10.824498525307929, -1.1781237515350647)

'''
nmrPipe -in bmse000139_hsqc.fid \
| nmrPipe  -fn EXT -x1 0% -xn 100% -sw -verb \
    -ov -out dummy_EXT.fid
'''

dic2, data2 = ng.pipe.read('bmse000139_HSQC/dummy_EXT.fid')

print 'Dummy EXT:'
print 'with carrier centering:'
print ng.pipe.make_uc(dic2, data2, 1).ppm_limits()  #(10.811850872750666, -1.1907714040923274)

print 'without carrier centering:'
print ng.pipe_proc.make_uc(dic2, data2, 1).ppm_limits() #(10.824498525307929, -1.1781237515350647)

You can see that just by performing a dummy EXT operation, the ppm_limits are off by about 0.5 ppm.

Of course, in this case even if dic[fn + "X1"] != 0. or dic[fn + "XN"] != 0.: won't resolve the issue since X1 and XN are still zeros. But this is a special case when EXTing 0-100%

The second issue happens when the spectrum is transposed. This can be tested by:

'''
nmrPipe -in bmse000139_hsqc.fid \
| nmrPipe  -fn TP -auto \
    -ov -out dummy_TP.fid
'''

dic3, data3 = ng.pipe.read('bmse000139_HSQC/dummy_TP.fid')

print ng.pipe.guess_udic(dic, data)[0]
#{'encoding': 'states', 'car': 10049.767108587257, 'sw': 9407.666015625, 'label': u'13C', 'complex': True, 'time': True, 'freq': False, 'obs': 100.6225357055664, 'size': 512}

print ng.pipe.guess_udic(dic3, data3)[0]
#{'encoding': 'states', 'car': 10049.767108587257, 'sw': 9407.666015625, 'label': u'13C', 'complex': True, 'time': True, 'freq': False, 'obs': 100.6225357055664, 'size': 1900}
jjhelmus commented 9 years ago

@ahmohamed Thanks for providing examples, these help identify the issues more clearly. I agree both the issues are bugs in nmrglue.

The second issue is a bug in how nmrglue determines the NMRPipe dimension name used to popular the axis dimension dictionaries in the universal dictionary. Since the data has been transposed the size is correct but the other spectral parameters are swapped. Commit 055ad8c03150cb6390b2c7072780e6eca8f507d3 changes this to use the ordering provided by the file. With this I change I obtain the following:

In [4]: ng.pipe.guess_udic(dic, data)[0]
Out[4]: 
{'car': 10049.767108587257,
 'complex': True,
 'encoding': 'states',
 'freq': False,
 'label': u'13C',
 'obs': 100.6225357055664,
 'size': 512,
 'sw': 9407.666015625,
 'time': True}

In [5]: ng.pipe.guess_udic(dic3, data3)[0]
Out[5]: 
{'car': 1922.3198781414394,
 'complex': True,
 'encoding': 'states',
 'freq': False,
 'label': u'1H',
 'obs': 400.1318664550781,
 'size': 1900,
 'sw': 4807.6923828125,
 'time': True}

In [6]: data.shape
Out[6]: (512, 950)

In [7]: data3.shape
Out[7]: (1900, 256)

I'm still trying to work out how NMRPipe stores unit information which is the root of the first issue.

ahmohamed commented 9 years ago

@jjhelmus Wow, Thanks for this quick fix. I agree that the first issue is harder, since it is really unclear how NMRPipe keeps track of the unit information. For now, i got around the issue as I mentioned above, which works 99% of the time.

I tried to look at the header changes between the raw and EXTed spectrum. I can't see anything useful, but maybe you can.


from datadiff import diff
print diff(dic, dic2)
'''
--- a
+++ b
{
-'FDDISPMAX': -9.999999843067494e+16,
+'FDDISPMAX': 69134.96875,
-'FDDISPMIN': 9.999999843067494e+16,
+'FDDISPMIN': -64277.66015625,
-'FDF2CENTER': 513.0,
+'FDF2CENTER': 476.0,
-'FDF2ORIG': -476.831298828125,
+'FDF2ORIG': -476.4658203125,
-'FDMAX': -9.999999843067494e+16,
+'FDMAX': 69134.96875,
-'FDMIN': 9.999999843067494e+16,
+'FDMIN': -64277.66015625,
@@  @@
}
'''
jjhelmus commented 9 years ago

Looks like the carrier frequency is not reliably set but the spectrum origin is, luckily we can calculate one from the other. commit 0c822cc45 makes this change, with which I find:

In [5]: ng.pipe.make_uc(dic, data, 1).ppm_limits()
Out[5]: (10.81093688809214, -1.1916853887508534)

In [6]: ng.pipe.make_uc(dic2, data2, 1).ppm_limits()
Out[6]: (10.811850740878555, -1.1907715359644389)

Which is in close enough agreement with the limits reported using the showhdr -verb command

jjhelmus commented 9 years ago

Still not sure if this is the absolute correct way to find parameter for the unit conversion object but it seem to be close for everything I've thrown at it.

jjhelmus commented 9 years ago

Closed with the merge of PR #34. Feel free to reopen if I missed anything.