Converting nmrML spectra to Bruker format

jjhelmus / nmrglue

A module for working with NMR data in Python

BSD 3-Clause "New" or "Revised" License

208 stars 85 forks source link

Converting nmrML spectra to Bruker format #160

Closed EKukstas closed 2 years ago

EKukstas commented 2 years ago

Hi, I'll start by saying that I'm a relative NMR novice, so apologies if I missed some trivial workaround. I need spectra in the Bruker format for use with another bit of software. The best source I have is HMDB, which offers nmrML files for the spectra that I want to use. So the task is to convert them from nmrML to Bruker. I can find lots of information on how to convert Bruker spectra to nmrML but nothing on going the other way. Nmrglue looks like the best candidate for this functionality. A quick search brings up the nmrml.read() and bruker.write() functions - great, however, there is no from_nmrml() converter object or anything relating to nmrml as far as conversions go.

Is this action possible? Perhaps there is an intermediate format/tool that I could use? e.g I have a way to convert JCAMP-DX and FID files using TopSpin. Failing that, I am prepared to add the functionality to nmrglue. I just need some pointers on where to start and what the best course of action would be.

kaustubhmote commented 2 years ago

As you rightly point out, nmrml.read, bruker.write and bruker.write_pdata are the function to use here. Unfortunately, generating Bruker acquNs and procNs files from a completely different format is non-trivial, as many of the parameters are in these files specific to Bruker. In fact, I think any previous conversion to a format like nmrml is bound to irretrievably throw away most of the parameters, so intermediate formats such as JCAMP-DX or the universal_dict from nmrglue will not be sufficient.

In this case, my suggestion would be first read in an unrelated Bruker dataset with the same dimensions as that of the data you actually want to read, and then edit the dictionary so that the parameters required for any further processing (spectral widths, number of points, etc) are correct. After that you can write out this edited dictionary with your own dataset read using nmrml.read. Most of the parameters that you write out will not correspond to your dataset, but as long as the important ones are consistent, you should be able to read this in any other software.

EKukstas commented 2 years ago

As you rightly point out, nmrml.read, bruker.write and bruker.write_pdata are the function to use here. Unfortunately, generating Bruker acquNs and procNs files from a completely different format is non-trivial, as many of the parameters are in these files specific to Bruker. In fact, I think any previous conversion to a format like nmrml is bound to irretrievably throw away most of the parameters, so intermediate formats such as JCAMP-DX or the universal_dict from nmrglue will not be sufficient.

In this case, my suggestion would be first read in an unrelated Bruker dataset with the same dimensions as that of the data you actually want to read, and then edit the dictionary so that the parameters required for any further processing (spectral widths, number of points, etc) are correct. After that you can write out this edited dictionary with your own dataset read using nmrml.read. Most of the parameters that you write out will not correspond to your dataset, but as long as the important ones are consistent, you should be able to read this in any other software.

Thanks for responding! It's good to have my understanding confirmed. I need the spectra in Bruker format for use with MetAssimulo - a simulation software that combines individual template spectra, taking into account inter-metabolite correlations and some other effects. I don't think it needs very much data for it, just that it was written with Bruker spectra in mind. I was able to download JCAMP-DX spectra from HMDB, open them in TopSpin and save in Bruker format. Whatever data loss there was didn't seem to matter for MetAssimulo, so an approach like that would work.

Is there a format that nmrglue can write to which acts similarly to JCAMP-DX? Anything that TopSpin can read, basically.

kaustubhmote commented 2 years ago

NMRglue can convert jcampdx to bruker format via a universal dictionary:

jdic, jdata = ng.jcampdx.read(jcamdx_data)
udic = ng.jcamdx.guess_udic(jdic, jdata)

C = ng.convert.converter()
C.from_universal(udic, jdata)
bdic, bdata = C.to_bruker()

ng.bruker.write(path, bdic, bdata)

However, as it stands currently, the code in to_bruker() method is minimal. The parameter files being written out have very little information. I would be surprised if MetAssimulo is able to handle that. This should not be a very difficult fix, if you want to attempt it, but I think there might be quite a lot of edge cases that one needs to carefully think about. The simplest things would be to manually add only the required items from jdic/udic to bdic (from the code above) and then write it out. At the very least, things like SW, spectrometer frequency, and reference will be required.

EKukstas commented 2 years ago

Can nmrglue convert nmrML files to jcampdx?

kaustubhmote commented 2 years ago

This cannot be done (automatically) currently. It will require two functions to be implemented: ng.nmrml.guess_udic and a to_jcampdx method for the converter class. I believe you might require only the first function so that you can go nmrml -> universal -> bruker.

However, without editing nmrglue itself, you can also make a universal dictionary with ng.fileiobase.create_blank_udic and add the required values from your nmrml file, and then do the conversion to the bruker format.

EKukstas commented 2 years ago

I followed your advice and got it to work. Well, almost! I have both Bruker and nmrML formats for certain metabolites, so I figured I'd use this to construct a conversion strategy. The code looks like this:

bdic, bdata = ng.bruker.read(bruker_data)
udic = ng.bruker.guess_udic(bdic, bdata)
ndic, ndata = ng.nmrml.read(nmrml_data)

C = ng.convert.converter()
C.from_universal(udic, ndata)

bndic, bndata = C.to_bruker()

bruker.write_pdata() complained until I added the following information to bndic:

procs = {'PPARMOD': bdic['procs']['PPARMOD']}
bndic['procs']     = procs

with that, I can then write Bruker files:

ng.bruker.write_pdata(file_name, bndic, bndata, pdata_folder='1', overwrite=True)

I believe all of that works correctly. I can read the saved file and the data looks exactly like the nmrML data. However, for MetAssimulo (which only needs the processed data, by the way) I need the 1i and procs files. brker.write_pdata can generate the procs and it's in clear text - I'm confident I can piece it together from the bdic. 1i on the other hand, I don't even know where to start. It's a binary file just like 1r but I don't see how I can modify brker.write_pdata to produce it. Can you point me in the right direction?

kaustubhmote commented 2 years ago

I am assuming that you have the imaginary part of the data somewhere in ndata. Most likely, ndata is going to be of type complex, so you should be able to get the imaginary part as ndata.imag. Then the 1i file can be written as:

ng.bruker.write_pdata(
    dir=f"{file_name}/pdata/1", 
    dic=bndic, 
    data=ndata.imag,
    bin_file="1i",  
    pdata_folder=False, 
    write_procs=False
)

EKukstas commented 2 years ago

I do indeed, thanks for clarifying! I was able to write the 1r, 1i, and procs files which is all that's needed for MetAssimulo. By doing a bit of hacking and corner-cutting I can get it to read the spectra and run without errors. Now, the spectrum it reads does not look right and I think I know why. The 'procs' file must contain the following values in it:

'_coreheader': header string; can be anything
'_comments'  : comment string; can be anything
'BYTORDP'    : endianness of the binary file; equal to 0 for "little" in all the Bruker files I've come across
'DTYPP'      : data type; equal to 0 for "int32",
'OFFSET'     : spectral offset (in Hz) for the window; unique for every spectrum, it seems
'PPARMOD'    : unsure what this parameter means; equal to zero in all cases I've encountered
'SF'         : observation frequency in MHz;
'SI'         : size of the binary file; from experimentation, I think it's equal to int(len(bndata.real.astype('>f8').tobytes())/4)
'SW_p'       : spectral window (?) in Hz;

Here is the problem: while I can reuse/work out some of these parameter, OFFSET and SW_p do not exist in the nmrML files from HMDB and I don't think there is a way to deduce them from the information I have. They also seem to be unique to each spectrum so I can't just copy them from a Bruker spectrum I already have. I fear I may have set myself up for failure from the beginning by assuming that the spectrum shown on HMDB (e.g. for Creatine) is a plot from the nmrML file that's included for download. That's what I was after - that spectrum in a format I could use with MetAssimulo. It doesn't look like there is enough information in the nmrML file. The plotted spectrum is likely a functional fit to the peak list rather than a simulated spectrum itself. Am I missing something here? Why would HMDB make these nmrML files available if they are of very little use?

kaustubhmote commented 2 years ago

Unfortunately, I think you are right. This particular dataset does not seem to have any parameters stored. As the web page seems to suggest, it is just a "prediction", so only the positions of the peaks, and their (relative) amplitudes seem to be stored.

EKukstas commented 2 years ago

Ah, that's a shame! Even the 'experimental' spectra on HMDB have nmrML files without this information, so it really does seem to be lost. I will close the issue now as I don't see the way forward for my particular use. The information here will (hopefully) help someone trying to do a similar conversion in the future. I am somewhat confident that it could be done for nmrML files that store the necessary parameters. Thank you so much, @kaustubhmote, for your help!