hitranonline / hapi

HITRAN Application Programming Interface (HAPI)
Other
80 stars 35 forks source link

Basic functionality broken for Python 3 on Windows (LF end-of-line character issue?) #1

Closed riwoodward closed 6 years ago

riwoodward commented 6 years ago

Hi,

Firstly, thanks for the useful API to the Hitran database.

Unfortunately, it seems not to work for Python 3 on Windows machines - e.g. running a basic fetch command fails with the following error:

Command:

import hapi as h
h.fetch('CO2',2,1,2000,2100)

Result:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-12a3201e4aae> in <module>()
----> 1 h.fetch('CO2',2,1,2000,2100)

C:\Dropbox\Work\Code\forked\hapi\hapi\hapi.py in fetch(TableName, M, I, numin, numax, ParameterGroups, Parameters)
   5496     """
   5497     queryHITRAN(TableName,[ISO[(M,I)][ISO_INDEX['id']]],numin,numax,
-> 5498                 pargroups=ParameterGroups,params=Parameters)
   5499     iso_name = ISO[(M,I)][ISO_INDEX['iso_name']]
   5500     Comment = 'Contains lines for '+iso_name

C:\Dropbox\Work\Code\forked\hapi\hapi\hapi.py in queryHITRAN(TableName, iso_id_list, numin, numax, pargroups, params, dotpar, head)
   3344     # Set comment
   3345     # Get this table to LOCAL_TABLE_CACHE
-> 3346     storage2cache(TableName)
   3347     print('PROCESSED')
   3348 

C:\Dropbox\Work\Code\forked\hapi\hapi\hapi.py in storage2cache(TableName, cast, ext)
   1768             converters.append(cfunc)
   1769             #start = end
-> 1770         data_matrix = [[cvt(line) for cvt in converters] for line in InfileData]
   1771         data_columns = zip(*data_matrix)
   1772         for qnt, col in zip(quantities, data_columns):

C:\Dropbox\Work\Code\forked\hapi\hapi\hapi.py in <listcomp>(.0)
   1768             converters.append(cfunc)
   1769             #start = end
-> 1770         data_matrix = [[cvt(line) for cvt in converters] for line in InfileData]
   1771         data_columns = zip(*data_matrix)
   1772         for qnt, col in zip(quantities, data_columns):

C:\Dropbox\Work\Code\forked\hapi\hapi\hapi.py in <listcomp>(.0)
   1768             converters.append(cfunc)
   1769             #start = end
-> 1770         data_matrix = [[cvt(line) for cvt in converters] for line in InfileData]
   1771         data_columns = zip(*data_matrix)
   1772         for qnt, col in zip(quantities, data_columns):

C:\Dropbox\Work\Code\forked\hapi\hapi\hapi.py in cfunc(line, dtype, start, end)
   1764                                 raise Exception('PARSE ERROR: unknown format of the par value (%s)'%line[start:end])
   1765                 else:
-> 1766                     return dtype(line[start:end])
   1767             #cfunc.__doc__ = 'converter {} {}'.format(qnt, fmt) # doesn't work in earlier versions of Python
   1768             converters.append(cfunc)

ValueError: invalid literal for int() with base 10: '\n'

I did get this same code working on Python 2 and a Ubuntu kernel with Python 3, however.

Looking at the error message, I think this may arise from the different end-of-line characters in Windows and Unix. Since Python 3 changed the default file type for open() compared to Python 2, this may explain why the error only happens for Python 3 + Windows.

I found a quick fix (after: https://stackoverflow.com/questions/2536545/how-to-write-unix-end-of-line-characters-in-windows-using-python/23434608#23434608) by adding the argument newline='\n' to each open() call, forcing unix end-of-line characters on all operating systems. I have a fork of this repo including the fixes.

Basic features (data extraction and plotting, as per the hapi manual) now work for me, although I haven't fully tested the extent of these changes.

If you want, I can submit these as a PR?

Cheers

hitranonline commented 6 years ago

Hi Robert,

Thanks for the bug report! The problem was that the downloaded *.par files contained the carriage return characters (CR) which were processed by Python 3 as an additional line endings. The queryHITRAN function, added an extra line ending which was system specific (i.e. on Windows it was CR LF). This combination breaks the storage2cache function. Unfortunately the default open function in Python 2.x doesn't have the newline parameter, so I have made a quick fix to the master branch using the similar function from the standard "io" module that is supported by both Python 2 and 3. Please let me know if it works on your machine so I can close the issue.

Cheers, Roman

riwoodward commented 6 years ago

Hi Roman,

Yes, this fixes it, thanks.

I have found another similar issue though (sorry!).... when downloading CO2 .par files, there's a problem with 18O13C17O and 13C17O2.

For example, the following code:

iso_id_list = [7, 8, 9, 10, 11, 12, 13, 14, 15, 120, 121]
fetch_by_ids('CO2', iso_id_list, 2150, 2220)

gives an error on the same line: ValueError: invalid literal for int() with base 10: 'A'

I think this is related to the following note on the hitran website (http://hitran.org/lbl/2?2=on):

Warnings: 
   18O13C17O has "A" instead of isotopologue number in .par line 
   13C17O2 has "B" instead of isotopologue number in .par line

Fortunately, this is easy to work around for most use cases, but just thought I'd mention it in case there is an easy fix to the API.

Cheers Rob

hitranonline commented 6 years ago

Hi Rob,

The current version of HAPI has the hard-coded mappings for the isotopologues. I've just added the correct treatment of non-digit characters in the "local_iso_id" field. Thanks once again for the bug report.

Cheers, Roman

riwoodward commented 6 years ago

Thanks Roman - that's all working well for me now.

Cheers, Rob