nj-saquer / IR-Spectra-Prediction-Graph-Models

MIT License
3 stars 0 forks source link

result is empty #1

Open AFelixLiu opened 4 months ago

AFelixLiu commented 4 months ago

when I run the code below, the result is always empty

with ThreadPool(8) as pool:
    list(tqdm.tqdm(pool.imap(retrieve_data_from_formula, formulas)))
    print("Done with formulas!")

here is the screenshot 微信图片_20240616033910

AFelixLiu commented 4 months ago

sorry, I have deal with the problem

AFelixLiu commented 4 months ago

but it seems the function ”process_file“ in file 2 is not complete,

nj-saquer commented 4 months ago

My apologies, I'll take a look at the issue later today.

AFelixLiu commented 4 months ago
process_file(filename):
    """Convert spectrum units to wavenumbers (x) and transmittance (y), and interpolate any missing coordinates, then return a list of transmitances"""

in above, the comment means convert y units to transmittance, but I have saw the jcamp_calc_xsec api in pycharm, is below

def jcamp_calc_xsec(jcamp_dict, wavemin=None, wavemax=None, skip_nonquant=True, debug=False):
    '''
    Taking as input a JDX file, extract the spectrum information and transform the absorption spectrum from existing
    units to absorption cross-section.

    This function also corrects for unphysical data (such as negative transmittance values, or transmission above
    1.0), and calculates absorbance if transmittance given. Instead of a return value, the function inserts the
    information into the input dictionary.

    Note that the conversion assumes that the measurements were collected for gas at a temperature of 296K (23 degC).

    Parameters
    ----------
    jcamp_dict : dict
        A JCAMP spectrum dictionary.
    wavemin : float, optional
        The shortest wavelength in the spectrum to limit the calculation to.
    wavemax : float, optional
        The longest wavelength in the spectrum to limit the calculation to.
    skip_nonquant: bool
        If True then return "None" if the spectrum is missing quantitative data. If False, then try to fill in \
        missing quantitative values with defaults.
    '''

    x = jcamp_dict['x']
    y = jcamp_dict['y']

    T = 296.0            ## the temperature (23 degC) used by NIST when collecting spectra
    R = 1.0355E-25       ## the constant for converting data (includes the gas constant)

    ## Note: normally when we convert from wavenumber to wavelength units, the ordinate must be nonuniformly
    ## rescaled in order to compensate. But this is only true if we resample the abscissa to a uniform sampling
    ## grid. In this case here, we keep the sampling grid nonuniform in wavelength space, such that each digital
    ## bin retains its proportionality to energy, which is what we want.
    if (jcamp_dict['xunits'].lower() in ('1/cm', 'cm-1', 'cm^-1')):
        jcamp_dict['wavenumbers'] = array(x)            ## note that array() always performs a copy
        x = 10000.0 / x
        jcamp_dict['wavelengths'] = x
    elif (jcamp_dict['xunits'].lower() in ('micrometers', 'um', 'wavelength (um)')):
        jcamp_dict['wavelengths'] = x
        jcamp_dict['wavenumbers'] = 10000.0 / x
    elif (jcamp_dict['xunits'].lower() in ('nanometers', 'nm', 'wavelength (nm)')):
        x = x / 1000.0
        jcamp_dict['wavelengths'] = x
        jcamp_dict['wavenumbers'] = 10000.0 / x
    else:
        raise ValueError('Don\'t know how to convert the spectrum\'s x units ("' + jcamp_dict['xunits'] + '") to micrometers.')

    ## Correct for any unphysical negative values.
    y[y < 0.0] = 0.0

    ## Make sure "y" refers to absorbance.
    if (jcamp_dict['yunits'].lower() == 'transmittance'):
        ## If in transmittance, then any y > 1.0 are unphysical.
        y[y > 1.0] = 1.0

        ## Convert to absorbance.
        okay = (y > 0.0)
        y[okay] = log10(1.0 / y[okay])
        y[logical_not(okay)] = nan

        jcamp_dict['absorbance'] = y
    elif (jcamp_dict['yunits'].lower() == 'absorbance'):
        pass
    elif (jcamp_dict['yunits'].lower() == '(micromol/mol)-1m-1 (base 10)'):
        jcamp_dict['yunits'] = 'xsec (m^2))'
        jcamp_dict['xsec'] = y / 2.687e19
        return
    else:
        raise ValueError('Don\'t know how to convert the spectrum\'s y units ("' + jcamp_dict['yunits'] + '") to absorbance.')

    ## Determine the effective path length "ell" of the measurement chamber, in meters.
    if ('path length' in jcamp_dict):
        (val,unit) = jcamp_dict['path length'].lower().split()[0:2]
        if (unit == 'cm'):
            ell = float(val) / 100.0
        elif (unit == 'm'):
            ell = float(val)
        elif (unit == 'mm'):
            ell = float(val) / 1000.0
        else:
            ell = 0.1
    else:
        if skip_nonquant: return({'info':None, 'x':None, 'xsec':None, 'y':None})
        ell = 0.1
        if debug: print('Path length variable not found. Using 0.1m as a default ...')

    assert(len(x) == len(y))

    if ('npoints' in jcamp_dict):
        if (len(x) != jcamp_dict['npoints']):
            npts_retrieved = str(len(x))
            msg = '"' + jcamp_dict['title'] + '": Number of data points retrieved (' + npts_retrieved + \
                  ') does not equal the expected length (npoints = ' + str(jcamp_dict['npoints']) + ')!'
            raise ValueError(msg)

    ## For each gas, manually define the pressure "p" at which the measurement was taken (in units of mmHg).
    ## These values are obtained from the NIST Infrared spectrum database, which for some reason did not
    ## put the partial pressure information into the header.
    if ('partial_pressure' in jcamp_dict):
        p = float(jcamp_dict['partial_pressure'].split()[0])
        p_units = jcamp_dict['partial_pressure'].split()[1]
        if (p_units.lower() == 'mmhg'):
            pass
        elif (p_units.lower() == 'ppm'):
            p = p * 759.8 * 1.0E-6       ## scale PPM units at atmospheric pressure to partial pressure in mmHg
    else:
        if debug: print('Partial pressure variable value for ' + jcamp_dict['title'] + ' is missing. Using the default p = 150.0 mmHg ...')
        if skip_nonquant: return({'info':None, 'x':None, 'xsec':None, 'y':None})
        p = 150.0

    ## Convert the absorbance units to cross-section in meters squared per molecule.
    xsec = y * T * R / (p * ell)

    ## Add the "xsec" values to the data dictionary.
    jcamp_dict['xsec'] = xsec

    return

I was confused that in above source code, it transfer the original y unit transmittance to absorance, but in the code comment it needs transmittance(I use the pprint to see the spectrum convert beform and after)

my python version is 3.7, and I can't found the JCAMP_reader, I use JCAMP_readfile to instead,

And I don't know where is the file 'NIST Dataset.pickle' ,

...........................................................

SOS, I have been perplexed by this code for a long time. I would appreciate it if the developer could provide a correct and detailed code.

AFelixLiu commented 4 months ago

In the process_file function, it don't return something useful, it can only return a bool value or None value, so how can make this code works correctly, try: spectrum = process_file(PATH + "/" + spectra_file) except: bad += 1 continue if type(spectrum) != np.ndarray: <<<<------- reason_bad.append(why_bad(PATH + "/" + spectra_file)) continue


 it just output false, that is I think why I run
```java
print(bad, "files couldn't be processed at all.")
from collections import Counter
print(Counter(reason_bad))

and the output is

2 files couldn't be processed at all.
Counter({"Couldn't read file": 12455, None: 2201, 'SOLID (KBr PELLET)': 101, 'SOLID (KBr DISC)': 98, 'LIQUID': 77, 'SOLID (1 mg / 650 mg KBr DISC)': 74, 'SOLID (SPLIT MULL)': 65, 'SOLID (SPLIT MULL), FLUOROLUBE FOR 3800-1330 CM^-^1, NUJOL FOR 1330-400 CM^-^1': 63, 'SOLID (NUJOL MULL)': 58, 'SOLID (OIL MULL)': 52, 'SOLID (MINERAL OIL MULL)': 46, 'SOLUTION (10% CCl4 FOR 5000-1330, 10% CS2 FOR 1330-625 CM^-^1)': 44, 'SOLID (0.8 mg / 650 mg KBr DISC)': 32, 'SOLID (1.0% IN KBr PELLET)': 32, 'LIQUID (NEAT)': 27, 'SOLID (SPLIT MULL), FLUOROLUBE FOR 3800-1330 CM^-^1, NUJOL FOR 1330-460 CM^-^1': 25, 'SOLID (SPLIT MULL), FLUOROLUBE FOR 3800-1330 CM^-^1, NUJOL FOR 1330-450 CM^-^1': 21, 'SOLUTION (10% CCl4 FOR 2.7-7.5, 10% CS2 FOR 7.5-26 MICRON) VS SOLVENT': 19, 'SOLUTION (10% IN CCl4 FOR 5000-1350 CM^-^1, 10% IN CS2 FOR 1350-625 CM^-^1)': 16, 'SOLID (0.9 mg / 650 mg KBr DISC)': 15, 'SOLUTION (10% CCl4 FOR 3800-1330, 10% CS2 FOR 1330-400 CM^-^1)': 15, 'SOLUTION (7% IN CHCl3)': 14, 'SOLID (KBr PRESSING)': 14, 'SOLID (MULL)': 13, 'SOLID (1.0 mg / 650 mg KBr DISC)': 12, 'SOLID (BETWEEN SALTS)': 11, 'SOLUTION (10% CCl4 FOR 2.5-7.5, 10% CS2 FOR 7.5-16 MICRON)': 9, 'SOLID (PELLET)': 9, 'SOLID (0.6 mg / 650 mg KBr DISC)': 9, 'SOLUTION 7% (CHCl3 FOR 2.5-12.5 microns)': 9, 'SOLID (0.7 mg / 500 mg KBr DISC)': 8, 'SOLID (KBr DISC, 0.8% WT.)': 8, 'SOLID (0.7 mg / 650 mg KBr DISC)': 8, 'SOLID (SPLIT MULL), FLUOROLUBE FOR 3800-1335 .........

the number of the type of couldn't read file is 12455, it is to big for the total compound 16153

nj-saquer commented 4 months ago

my python version is 3.7, and I can't found the JCAMP_reader, I use JCAMP_readfile to instead,

https://pypi.org/project/jcamp/

And I don't know where is the file 'NIST Dataset.pickle'

I wasn't able to upload this file to github, this should be the same file: https://drive.google.com/file/d/1xeafmGi12KJPUWY2EgDf0LmWL-0M3Jl1/view?usp=sharing

nj-saquer commented 4 months ago

the number of the type of couldn't read file is 12455, it is to big for the total compound 16153

When I reran the file now I got"Couldn't read file": 2354 for a list of 16100 however this was from files downloaded 3 years ago, I'm not sure if the website changed how/where these files are stored.

In the process_file function, it don't return something useful, it can only return a bool value or None value, so how can make this code works correctly

I agree this was written poorly and the comment doesn't match what it does. It seems this function only serves as a counter for how many files can be easily read with the JCAMP_reader(filename) function

AFelixLiu commented 4 months ago

the number of the type of couldn't read file is 12455, it is to big for the total compound 16153

When I reran the file now I got"Couldn't read file": 2354 for a list of 16100 however this was from files downloaded 3 years ago, I'm not sure if the website changed how/where these files are stored.

In the process_file function, it don't return something useful, it can only return a bool value or None value, so how can make this code works correctly

I agree this was written poorly and the comment doesn't match what it does. It seems this function only serves as a counter for how many files can be easily read with the JCAMP_reader(filename) function

thanks for your answer