wkumler / mzsql

A repository of data and code showing the efficiency of databases relative to existing mass-spectrometry database formats
0 stars 0 forks source link

MZA things #9

Closed wkumler closed 6 days ago

wkumler commented 3 weeks ago

MZA tasks:

It looks like they've got code (or at least made a figure) for chromatogram indexing in the manuscript, would be great to get that code from somewhere. Most of the others can be accessed through their Python package: https://mzapy.readthedocs.io/en/latest/.

Update: looks like the MZA write code has diverged from the MZA read code because I'm getting an error on trying to load their demo: Unable to synchronously open object (object 'Full_mz_array' doesn't exist). Wonder if they updated the names of the hdf5 arrays?

wkumler commented 2 weeks ago

Spectrum extraction code (from demo here):

import h5py
#import hdf5plugin
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

mzafile = "demo_data/180205_Poo_TruePoo_Full1.mza"

mza = h5py.File(mzafile, 'r')
intensities = mza["Arrays_intensity/1001"][:]
mz = mza["Arrays_mz/1001"][:]

mza.close()
plt.vlines(mz, 0, intensities)
wkumler commented 2 weeks ago

Chromatogram extraction code (broken, MZA can't read the current demo file):

%%inactive

from mzapy import MZA

mza = MZA('demo_data/180205_Poo_TruePoo_Full1.mza')

mz_min, mz_max = pmppm(118.0865, 10)
xic_rt, xic_i = mza.collect_xic_arrays_by_mz(mz_min, mz_max)

rt_min, rt_max = 6, 8
ms1_mz, ms1_i = mza.collect_ms1_arrays_by_rt(rt_min, rt_max)
wkumler commented 2 weeks ago

Actual chromatogram extraction code:

import matplotlib.pyplot as plt
import h5py
import numpy as np
import pandas as pd

def get_chrom_mza(file, mz, ppm):
    mza = h5py.File(file, 'r')

    scan_dfs = []
    file_keys = sorted(mza["Arrays_intensity"].keys(), key=lambda x: int(x))
    for index, scan_num in enumerate(file_keys):
        scan_df=pd.DataFrame({
            "rt": mza["Metadata"][index][6],
            "mz": mza["Arrays_mz/"+scan_num][...],
            "int": mza["Arrays_intensity/"+scan_num][...]
        })
        scan_dfs.append(scan_df)
    mza.close()

    file_df = pd.concat(scan_dfs, ignore_index=True)

    mzmin, mzmax = pmppm(mz, ppm)
    chrom_data = file_df[(file_df["mz"]>mzmin) & (file_df["mz"]<mzmax)]
    return(chrom_data)

chrom_data = get_chrom_mza('demo_data/180205_Poo_TruePoo_Full1.mza', 118.0865, 10)

plt.plot(chrom_data["rt"], chrom_data["int"])
plt.show()