jjhelmus / nmrglue

A module for working with NMR data in Python
BSD 3-Clause "New" or "Revised" License
210 stars 86 forks source link

Using peak picking with PPM scale #57

Closed Jmegan042 closed 2 years ago

Jmegan042 commented 7 years ago

Please allow me to start off with an apology. As there isn't a "subreddit" like place that I can post general questions, which I am mostly used to. I seem to be having an issue with getting the peak picking function to behave in the PPM Scale.

My code currently reads as:

dic, data = ng.bruker.read_pdata('..../1/pdata/1/')

sw =6395.862
obs = 400.1324
car= 2400.78

uc = ng.fileiobase.unit_conversion(data.size,True,sw,obs,car)
frq = uc.ppm_scale()

# Peak Picking
peaks = ng.peakpick.pick(data,28000)# Currently, works when everything is in Hz. 
npeaks = len(peaks) 

#Attempts at converting the peak list? (i have no idea if this is the correct way to go about it.)
#uc2 = ng.fileiobase.unit_conversion(peaks.size,True,sw,obs,car)
#pkfrq = uc2.ppm_scale()

# plot
fig = plt.figure(figsize=(20,10))
ax = fig.add_subplot(111)

ax.plot(frq,data, 'k-')
ax.invert_xaxis() #This makes it read like a standard spectra
ax.plot(peaks['X_AXIS'],data[peaks['X_AXIS'].astype('int')], 'ro') #This plots the picked peaks on the spectra
ax.axhline(y=28000, color='k', ls='--')#This shows the cutoff for the peak picking
ax.set_xlabel('PPM')

plt.show()

When I do this, I get:

figure_1

However, when I actually attempt the section of code for `uc2 = ng.fileiobase.unit_conversion(data.size,True,sw,obs,car)

pkfrq = uc2.ppm_scale()`

And change: ax.plot(pkfrq,peaks['X_AXIS'],pkfrq,data[peaks['X_AXIS'].astype('int')], 'ro')

I get figure_2

Any tips as to what I seem to be missing here? <cmd+s>My initial thoughts are that it is incorrectly scaling the peak list that it is generating. I thought it might be because the overall size is only set to scale the length of the farthest peak, which is less than the collected Hz. This would explain why it seems to be "skewed" in relation to the actual positioning of the peaks. So, as a remedy I attempted to change the "pkfrq" to the "frq" conversion factor that seemed to work to plot the spectrum. However, I keep getting errors saying x and y must have same first dimension. How is it that this conversion works with with plotting the spectra, but not the picked peaks?

Any advice would be greatly appreciated. I seem to be uncovering a thousand questions as I go along with this, but I am getting closer to being able to automate. Perhaps I am overcomplicating things.

As a help, perhaps someone could supplement the Bruker processing codes that are used in examples? I've been trying to put together small elements that are scattered through all the examples, and have conquered may of my initial issues.

Edit

After struggling through this all day, I've discovered a few things that I think have pointed me in the correct direction, but I still have not resolved it. For one, I now know that I cannot scale the picked peaks the same way that I scaled the data. Because the data is originally plotted by "collection points" or "size" of the spectrum, and not by Hz or ppm, this is why the uc function is needed for the data. Before, I wasn't sure what it was using as an X axis before using the uc.

However, I have not been able to get the peaks plotted onto the ppm spectrum, or (in my case) more importantly, convert the resulting peak list into ppm. Although I have successfully converted it to a pandas dataframe for easy reading.

Jmegan042 commented 7 years ago

I have solved this problem by doing the following:

dic, data = ng.bruker.read_pdata('JE-9-27-02/1/pdata/1/') #Import Data

SW = 6393.862                                   #Sweep Width in Hz
SF01M = 400.1324008                             #Transmitter Frequency offset in MHz
O1 = 2400.78                                    #Frequency Offset in Hz

uc = ng.fileiobase.unit_conversion(data.size,True,SW,SF01M,O1) #Convert spectra to PPM
frq = uc.ppm_scale()

peaks = ng.peakpick.pick(data,28000)            #Pick Peaks
npeaks = len(peaks)

SF01 = 400132400.8                              #Transmitter Frequency offset in Hz
SF = 400130003.6                                #Spectrometer Frequency (INCLUDES SR VALUE)
SFMhz = 400.1300036                             #Converted SF to Mhz
SW = 6393.862                                   #Sweep Width in Hz
FIDRES = 0.195125                               #Fid Resolution
SR = 3.56                                       #SR value in Hz
SI = 32768                                      #Size of FID. Number of points collected. 
startpoint = ((SF01+(0.5*6393.862)))            #The maximum value of Hz collected. Upper limmit. 

DeltaStart = (startpoint - SF)                  # Maximum change in Hz that may be observed at collection point 1. 
PeaksHZ = (DeltaStart -((peaks.X_AXIS)*FIDRES)) #List of observed delta in Hz procession Frequency 
PeaksPPM = ((PeaksHZ+SR)/SFMhz)                    # Converts the deltaHz to PPM based on Frequency of the spectrometer
print(PeaksPPM)

fig = plt.figure(figsize=(20,10))               #Plot figure
ax = fig.add_subplot(111)

ax.plot(frq,data, 'k-')
ax.invert_xaxis()
ax.plot(PeaksPPM,data[peaks['X_AXIS'].astype('int')], 'ro')#This displays picked peaks
ax.axhline(y=28000, color='k', ls='--')#This shows the cutoff for the peak picking
ax.set_xlabel('PPM')
plt.show()

However, I am now having an issue with integration, or rather, how to do it from a peak list that is generated from this peak picking. This makes sense based on the way that I had to get around this problem of converting it to Hz. I do not have a good way to extract the peak start and stop autonomously.

Any suggestions?

jjhelmus commented 7 years ago

Please allow me to start off with an apology. As there isn't a "subreddit" like place that I can post general questions, which I am mostly used to.

nmrglue is not big enough to have a forum based question site. Posting questions here is fine. The mailing list is also an option although there has not been much traffic there recently.

Peak picking and all other function in nmrglue operate by default in units of points. To convert to PPM or other units you should use a unit conversion object which deals with all of the organization of the carrier frequencies, offsets and spectral widths for you. For Bruker data these can be created from a universal dictionary which can be created from the file metadata (dic) and data (data). For example to pick all the peaks in the 1D capistruin spectrum and plot them on a PPM scale one could use the following script:

import matplotlib.pyplot as plt
import nmrglue as ng

# read in the data
dic, data = ng.bruker.read_pdata('./Capistruin/10/pdata/1')

# peak pick the data
peak_table = ng.peakpick.pick(data, pthres=1e6, algorithm='downward')

# make a universal parameter dictionary from the data
udic = ng.bruker.guess_udic(dic, data)

# create a unit conversion object
uc = ng.fileiobase.uc_from_udic(udic)

# convert the peak locations from unit of points to PPM
# using the unit conversion object
peak_locations_ppm = [uc.ppm(i) for i in peak_table['X_AXIS']]

# Find the peak amplitudes
peak_amplitudes = data[peak_table['X_AXIS'].astype('int')]

# plot the spectrum and peak locations on a PPM scale
fig = plt.figure()
ax = fig.add_subplot(111)

ppm_scale = uc.ppm_scale()
ax.plot(ppm_scale, data, 'k-')

ax.plot(peak_locations_ppm, peak_amplitudes, 'ro')

ax.invert_xaxis()
ax.set_xlim(7.8, 6.8)
ax.set_xlabel('PPM')
plt.show()

Which produces the following plot:

figure_1-1

The peak table from the above script also records peak line width estimates which could be used as limits for integration as follows:

import numpy as np
for peak_number in range(len(peak_table)):
    loc_ppm = peak_locations_ppm[peak_number]
    loc_pts = int(peak_table['X_AXIS'][peak_number])
    fwhm = peak_table['X_LW'][peak_number]
    hwhm_int = int(np.floor(fwhm / 2.))
    peak_area = data[loc_pts - hwhm_int: loc_pts+hwhm_int+1].sum()
    print(loc_ppm, peak_area)

More details on the outputs of the pick function can be found in the documentation

Jmegan042 commented 7 years ago

Your timing could not possibly be better. I was just mulling over how to use the peak line widths to define the parameters for integration.

Thanks for the followup. I can use a lot of things here to clean up my code. I will update some if I can on the questions I'm coming to form.

Jmegan042 commented 7 years ago

How does the integration protocol above actually work? Does it integrate the regions above the peak picking cutoff, or is it integrating down to the baseline?

I noticed that the integration values that I am getting using this is very far off from the integration that I get when using something like MNOVA. I double checked by processing a spectrum I knew to be have a methyl group as well as an internal standard. In MNOVA, the ratios of the integration areas came out to be a value around 0.22xxx, but when I used the values from the above code, I get 0.03xxxx. Although the phasing could be better, I wouldn't think this would cause this large a difference, would it?

Any thoughts on this? Is something else going on?

Also, I was curious about the fitting examples. Does the fitting function allow for integration of only real peaks by smoothing out any fluctuations that it might see as peaks? Or is there another function of this?

Cheers, Joe

jjhelmus commented 7 years ago

@Jmegan042 The above integration does a sum of all spectral points within one half width half max, there is not baseline identification done. These results may be very different than those found by more sophisticated methods.

The fitting function in nmrglue performs a constrained least squares fit of the spectra to the peak model specified by the function. False peaks should either be included in the model or removed by other means (i.e. pre-processing).