python3 byte-string errors when reading posterior hdf files

gwastro / pycbc

Core package to analyze gravitational-wave data, find signals, and study their parameters. This package was used in the first direct detection of gravitational waves (GW150914), and is used in the ongoing analysis of LIGO/Virgo data.

http://pycbc.org

GNU General Public License v3.0

317 stars 354 forks source link

python3 byte-string errors when reading posterior hdf files #2944

Open dbkeitel opened 5 years ago

dbkeitel commented 5 years ago

I have a pycbc posterior file created under 1.13.8. Using 1.14.1 with python2, it still loads fine. However, with python3:

from pycbc.inference.io import loadfile
fp = loadfile('test.hdf','r')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/cvmfs/ligo-containers.opensciencegrid.org/lscsoft/conda/latest/envs/ligo-py36/lib/python3.6/site-packages/pycbc/inference/io/__init__.py", line 105, in loadfile
    fileclass = get_file_type(path)
  File "/cvmfs/ligo-containers.opensciencegrid.org/lscsoft/conda/latest/envs/ligo-py36/lib/python3.6/site-packages/pycbc/inference/io/__init__.py", line 70, in get_file_type
    return filetypes[filetype]
KeyError: b'posterior_file'

Looks like one of those pesky byte string conversion issues to me: it seems to want to use a read-from-file byte string as a key to a dict which expects regular strings.

dbkeitel commented 5 years ago

Generalised the title of this issue, as after trying to fix the first KeyError, at least two more byte-string-related errors crop up later in loadfile() and read_samples().

spxiwh commented 5 years ago

@dbkeitel Have I said how much I hate string handling in python3??? I hate string handling in python3!!

Because of a large stack of issues like this, pycbc_inference is not python3-compatible at the moment. These issues largely arise from the fact that HDF needs to store these things as a byte type, but a lot of our old python2 code will just expect this to be a string and fail. It took a long time to sort these kind of issues out in the search code. @cdcapano (or someone else that Collin can "convince" to do this) would have to do the same for inference before this is going to work.

In short: Stick with python2 for now if using inference. Other things, however, are pretty close to python3-ready!

dbkeitel commented 5 years ago

@spxiwh I'm absolutely with you on the hating! But thanks for the details, and I now also found a "support for python 3 in almost all core modules (inference and tmpltbank not included yet)" warning in the 1.14.0 notes which unfortunately wasn't repeated explicitly for 1.14.1... ;)

cdcapano commented 5 years ago

I just tried testing this. If I create the file using python3, it works. The issue is when you create the file using python2.7, then try to read it in a python3 environment. I guess it's due to strings not being encoded properly when written to the hdf attributes. However, I don't really understand the problem/solution. @spxiwh Could you explain what your fix is for this byte encoding stuff? Is it simply adding .encode to every place that a string is written to an hdf attribute?