Open AFelixLiu opened 5 months ago
sorry, I have deal with the problem
but it seems the function ”process_file“ in file 2 is not complete,
My apologies, I'll take a look at the issue later today.
process_file(filename):
"""Convert spectrum units to wavenumbers (x) and transmittance (y), and interpolate any missing coordinates, then return a list of transmitances"""
in above, the comment means convert y units to transmittance, but I have saw the jcamp_calc_xsec api in pycharm, is below
def jcamp_calc_xsec(jcamp_dict, wavemin=None, wavemax=None, skip_nonquant=True, debug=False):
'''
Taking as input a JDX file, extract the spectrum information and transform the absorption spectrum from existing
units to absorption cross-section.
This function also corrects for unphysical data (such as negative transmittance values, or transmission above
1.0), and calculates absorbance if transmittance given. Instead of a return value, the function inserts the
information into the input dictionary.
Note that the conversion assumes that the measurements were collected for gas at a temperature of 296K (23 degC).
Parameters
----------
jcamp_dict : dict
A JCAMP spectrum dictionary.
wavemin : float, optional
The shortest wavelength in the spectrum to limit the calculation to.
wavemax : float, optional
The longest wavelength in the spectrum to limit the calculation to.
skip_nonquant: bool
If True then return "None" if the spectrum is missing quantitative data. If False, then try to fill in \
missing quantitative values with defaults.
'''
x = jcamp_dict['x']
y = jcamp_dict['y']
T = 296.0 ## the temperature (23 degC) used by NIST when collecting spectra
R = 1.0355E-25 ## the constant for converting data (includes the gas constant)
## Note: normally when we convert from wavenumber to wavelength units, the ordinate must be nonuniformly
## rescaled in order to compensate. But this is only true if we resample the abscissa to a uniform sampling
## grid. In this case here, we keep the sampling grid nonuniform in wavelength space, such that each digital
## bin retains its proportionality to energy, which is what we want.
if (jcamp_dict['xunits'].lower() in ('1/cm', 'cm-1', 'cm^-1')):
jcamp_dict['wavenumbers'] = array(x) ## note that array() always performs a copy
x = 10000.0 / x
jcamp_dict['wavelengths'] = x
elif (jcamp_dict['xunits'].lower() in ('micrometers', 'um', 'wavelength (um)')):
jcamp_dict['wavelengths'] = x
jcamp_dict['wavenumbers'] = 10000.0 / x
elif (jcamp_dict['xunits'].lower() in ('nanometers', 'nm', 'wavelength (nm)')):
x = x / 1000.0
jcamp_dict['wavelengths'] = x
jcamp_dict['wavenumbers'] = 10000.0 / x
else:
raise ValueError('Don\'t know how to convert the spectrum\'s x units ("' + jcamp_dict['xunits'] + '") to micrometers.')
## Correct for any unphysical negative values.
y[y < 0.0] = 0.0
## Make sure "y" refers to absorbance.
if (jcamp_dict['yunits'].lower() == 'transmittance'):
## If in transmittance, then any y > 1.0 are unphysical.
y[y > 1.0] = 1.0
## Convert to absorbance.
okay = (y > 0.0)
y[okay] = log10(1.0 / y[okay])
y[logical_not(okay)] = nan
jcamp_dict['absorbance'] = y
elif (jcamp_dict['yunits'].lower() == 'absorbance'):
pass
elif (jcamp_dict['yunits'].lower() == '(micromol/mol)-1m-1 (base 10)'):
jcamp_dict['yunits'] = 'xsec (m^2))'
jcamp_dict['xsec'] = y / 2.687e19
return
else:
raise ValueError('Don\'t know how to convert the spectrum\'s y units ("' + jcamp_dict['yunits'] + '") to absorbance.')
## Determine the effective path length "ell" of the measurement chamber, in meters.
if ('path length' in jcamp_dict):
(val,unit) = jcamp_dict['path length'].lower().split()[0:2]
if (unit == 'cm'):
ell = float(val) / 100.0
elif (unit == 'm'):
ell = float(val)
elif (unit == 'mm'):
ell = float(val) / 1000.0
else:
ell = 0.1
else:
if skip_nonquant: return({'info':None, 'x':None, 'xsec':None, 'y':None})
ell = 0.1
if debug: print('Path length variable not found. Using 0.1m as a default ...')
assert(len(x) == len(y))
if ('npoints' in jcamp_dict):
if (len(x) != jcamp_dict['npoints']):
npts_retrieved = str(len(x))
msg = '"' + jcamp_dict['title'] + '": Number of data points retrieved (' + npts_retrieved + \
') does not equal the expected length (npoints = ' + str(jcamp_dict['npoints']) + ')!'
raise ValueError(msg)
## For each gas, manually define the pressure "p" at which the measurement was taken (in units of mmHg).
## These values are obtained from the NIST Infrared spectrum database, which for some reason did not
## put the partial pressure information into the header.
if ('partial_pressure' in jcamp_dict):
p = float(jcamp_dict['partial_pressure'].split()[0])
p_units = jcamp_dict['partial_pressure'].split()[1]
if (p_units.lower() == 'mmhg'):
pass
elif (p_units.lower() == 'ppm'):
p = p * 759.8 * 1.0E-6 ## scale PPM units at atmospheric pressure to partial pressure in mmHg
else:
if debug: print('Partial pressure variable value for ' + jcamp_dict['title'] + ' is missing. Using the default p = 150.0 mmHg ...')
if skip_nonquant: return({'info':None, 'x':None, 'xsec':None, 'y':None})
p = 150.0
## Convert the absorbance units to cross-section in meters squared per molecule.
xsec = y * T * R / (p * ell)
## Add the "xsec" values to the data dictionary.
jcamp_dict['xsec'] = xsec
return
I was confused that in above source code, it transfer the original y unit transmittance to absorance, but in the code comment it needs transmittance(I use the pprint to see the spectrum convert beform and after)
my python version is 3.7, and I can't found the JCAMP_reader, I use JCAMP_readfile to instead,
And I don't know where is the file 'NIST Dataset.pickle' ,
...........................................................
SOS, I have been perplexed by this code for a long time. I would appreciate it if the developer could provide a correct and detailed code.
In the process_file function, it don't return something useful, it can only return a bool value or None value, so how can make this code works correctly, try: spectrum = process_file(PATH + "/" + spectra_file) except: bad += 1 continue if type(spectrum) != np.ndarray: <<<<------- reason_bad.append(why_bad(PATH + "/" + spectra_file)) continue
it just output false, that is I think why I run
```java
print(bad, "files couldn't be processed at all.")
from collections import Counter
print(Counter(reason_bad))
and the output is
2 files couldn't be processed at all.
Counter({"Couldn't read file": 12455, None: 2201, 'SOLID (KBr PELLET)': 101, 'SOLID (KBr DISC)': 98, 'LIQUID': 77, 'SOLID (1 mg / 650 mg KBr DISC)': 74, 'SOLID (SPLIT MULL)': 65, 'SOLID (SPLIT MULL), FLUOROLUBE FOR 3800-1330 CM^-^1, NUJOL FOR 1330-400 CM^-^1': 63, 'SOLID (NUJOL MULL)': 58, 'SOLID (OIL MULL)': 52, 'SOLID (MINERAL OIL MULL)': 46, 'SOLUTION (10% CCl4 FOR 5000-1330, 10% CS2 FOR 1330-625 CM^-^1)': 44, 'SOLID (0.8 mg / 650 mg KBr DISC)': 32, 'SOLID (1.0% IN KBr PELLET)': 32, 'LIQUID (NEAT)': 27, 'SOLID (SPLIT MULL), FLUOROLUBE FOR 3800-1330 CM^-^1, NUJOL FOR 1330-460 CM^-^1': 25, 'SOLID (SPLIT MULL), FLUOROLUBE FOR 3800-1330 CM^-^1, NUJOL FOR 1330-450 CM^-^1': 21, 'SOLUTION (10% CCl4 FOR 2.7-7.5, 10% CS2 FOR 7.5-26 MICRON) VS SOLVENT': 19, 'SOLUTION (10% IN CCl4 FOR 5000-1350 CM^-^1, 10% IN CS2 FOR 1350-625 CM^-^1)': 16, 'SOLID (0.9 mg / 650 mg KBr DISC)': 15, 'SOLUTION (10% CCl4 FOR 3800-1330, 10% CS2 FOR 1330-400 CM^-^1)': 15, 'SOLUTION (7% IN CHCl3)': 14, 'SOLID (KBr PRESSING)': 14, 'SOLID (MULL)': 13, 'SOLID (1.0 mg / 650 mg KBr DISC)': 12, 'SOLID (BETWEEN SALTS)': 11, 'SOLUTION (10% CCl4 FOR 2.5-7.5, 10% CS2 FOR 7.5-16 MICRON)': 9, 'SOLID (PELLET)': 9, 'SOLID (0.6 mg / 650 mg KBr DISC)': 9, 'SOLUTION 7% (CHCl3 FOR 2.5-12.5 microns)': 9, 'SOLID (0.7 mg / 500 mg KBr DISC)': 8, 'SOLID (KBr DISC, 0.8% WT.)': 8, 'SOLID (0.7 mg / 650 mg KBr DISC)': 8, 'SOLID (SPLIT MULL), FLUOROLUBE FOR 3800-1335 .........
the number of the type of couldn't read file is 12455, it is to big for the total compound 16153
my python version is 3.7, and I can't found the JCAMP_reader, I use JCAMP_readfile to instead,
https://pypi.org/project/jcamp/
And I don't know where is the file 'NIST Dataset.pickle'
I wasn't able to upload this file to github, this should be the same file: https://drive.google.com/file/d/1xeafmGi12KJPUWY2EgDf0LmWL-0M3Jl1/view?usp=sharing
the number of the type of couldn't read file is 12455, it is to big for the total compound 16153
When I reran the file now I got"Couldn't read file": 2354
for a list of 16100 however this was from files downloaded 3 years ago, I'm not sure if the website changed how/where these files are stored.
In the process_file function, it don't return something useful, it can only return a bool value or None value, so how can make this code works correctly
I agree this was written poorly and the comment doesn't match what it does. It seems this function only serves as a counter for how many files can be easily read with the JCAMP_reader(filename)
function
the number of the type of couldn't read file is 12455, it is to big for the total compound 16153
When I reran the file now I got
"Couldn't read file": 2354
for a list of 16100 however this was from files downloaded 3 years ago, I'm not sure if the website changed how/where these files are stored.In the process_file function, it don't return something useful, it can only return a bool value or None value, so how can make this code works correctly
I agree this was written poorly and the comment doesn't match what it does. It seems this function only serves as a counter for how many files can be easily read with the
JCAMP_reader(filename)
function
thanks for your answer
when I run the code below, the result is always empty
here is the screenshot