jimbraun / XCDF

XCDF: eXplicitly Compacted Data Format. See documentation at Read the Docs:
https://xcdf.readthedocs.io/en/latest/
Other
14 stars 8 forks source link

Python3 bindings from a conda environment #92

Closed HealthyPear closed 1 year ago

HealthyPear commented 2 years ago

Hello,

I would like to open xcdf files from a Python3 installation within a conda environment (I don't care about writing - see #79 - at least for the moment). My setup is the following,

This is probably a long shot, given that #87 and #89 are still open and my tests failed, but I hope we can reach a solution as these tools are widely used now. I saw that #91, so it might be "just" a problem of interfacing the compilation with a conda env.

Using Python2 works fine, but of course, it should be updated to work with Python3 (at least >=3.7).

I tried using 3.6 onwards without success (<3.5 is not possible unless one installs a very old conda and would make sense anyway).

In all cases, the Python installation is always found during the cmake configuration step, e.g. for Python3.10,

-- Python version:
--   * binary:   /Users/michele/Applications/mambaforge/envs/py3xcdf/bin/python
--   * includes: /Users/michele/Applications/mambaforge/envs/py3xcdf/include/python3.10
--   * libs:     /Users/michele/Applications/mambaforge/envs/py3xcdf/lib/libpython3.10.dylib

The compilers, make and cmake are installed in the conda environment together with python, e.g.

name: py3xcdf
channels:
  - conda-forge
dependencies:
  - python
  - compilers
  - make
  - cmake

The build is done outside of the cloned repository and the installation directory is also created at the same level,

.
├── build
├── install_dir
├── repository
└── xcdf_conda_env_python3.yaml 

I made a simple installation bash script (I had to change the extension to .txt due to GitHub file upload limitations) install.txt

When I try to import pyxcdf from the build directory I obtain the following results (a couple of examples),

Python 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:45:10) [Clang 12.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyxcdf
Segmentation fault: 11
Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:05:47)
[Clang 12.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyxcdf
Fatal Python error: _PyInterpreterState_Get(): no current thread state
Python runtime state: unknown

Abort trap: 6

If you are not familiar with conda, given a pyxcdf environment:

Another problem I don't uderstand is that I can only launch an xcdf executable from the build directory (where all of them are stored together with header files and libraries), but not from the installation directory (where these 3 components are separated in subfolders) even if I symlink the files to the bin, include and lib (I didn't try by copying them though). Is there some other environment variable to set?

jimbraun commented 2 years ago

Hi Michele,

I'm personally not at all familiar with Conda and can't support issues about integration of XCDF and Conda. I can possibly update XCDF for compatibility with Python3. I'm supporting XCDF on my own time now, so progress will be slow. You've probably noticed that there has been no maintenance for the past 5 years.

On your specific comment, the XCDF libraries are the same as any other C++ library and need to be in your LD_LIBRARY_PATH. Typically this environment variable is set by the parent installation tool (e.g. by Ape if working with HAWC data), but it needs to be specified manually if using a bare XCDF installation that isn't in the standard library search path.

HealthyPear commented 2 years ago

Thanks for the reply!

I am less convinced that it is a conda problem.

As I said, the only difference is that the conda env lives in $CONDA_PREFIX and this folder has a bin, include and lib subfolders where normally the conda package manager distributes the files from an installed package.

In this case I am testing this from a local build directory where all the required files are stored. In fact, if I am using a Python2-based environment it works fine (or at least I can do import pyxcdf and nothing crashes).

What is the last "success story" using Python3? Specifically, what version of Python3 should I expect will work?

HealthyPear commented 2 years ago

I can possibly update XCDF for compatibility with Python3.

Just to be clearer: I saw #91, so as a non-expert I assumed the python bindings are already Python3-compatible - is this correct?

jimbraun commented 2 years ago

The XCDF pybindings were written for Python2 and haven't been updated for or tested against Python3 as far as I know. At the very least, I suspect the string/unicode change in Python3 will be a problem, and I wouldn't currently expect XCDF to work with Python3 at all.

HealthyPear commented 2 years ago

Ah, I see, this explains a lot!

First, let me explain the context, which I should have done first perhaps...

I am contributing to the development of the SWGO observatory. In particular, from the software point of view, SWGO is in its R&D phase and for this people have recycled software from a previous generation observatory called HAWC.

Such software writes XCDF files (that's why I stumbled on your repository). I am not sure if XCDF will be the final format used, but I want to make sure we at least decouple from the old HAWC software during the prototyping of the final software framework and the simulation code (which writes XCDF files) is too complex for now to substitute.

Today I worked with my friend @maxnoe (who helped me greatly because I've never developed in CMake and Python bindings and I am learning a lot in the process). I have forked your repo (this is the new branch) with the aim of testing/building the Python3 bindings using modern software dependencies (it will be possible to install the package in python directly at least from PyPI - aka with pip).

If you are willing to accept a Pull Request (I will have to make sure that everything needed works because I am new in the collaboration and I am just exploring the files) I think this would improve the package and since Python2 is very well dead, we could make a new release (v4.0.0?).

HealthyPear commented 2 years ago

e.g we can already do,

import xcdf

f = xcdf.File("some_file.xcd", "r")
f.comments

which results in

['! units HAWCSim.Evt.Energy             : GeV',
 '! units HAWCSim.Evt.X0                 : g/cm^2',
 '! units HAWCSim.Evt.XMax               : g/cm^2',
 '! units HAWCSim.Evt.A                  : g/cm^2',
 '! units HAWCSim.Evt.C                  : cm^2/g',
 '! units HAWCSim.Evt.FirstIntZ          : cm',
 '! units HAWCSim.Evt.Theta              : deg',
 '! units HAWCSim.Evt.Phi                : deg',
.....]

and

names = f.field_names
for e in f:
   print(dict(zip(names, e)))
   break

which results in a dictionary,

{'HAWCSim.Evt.Num': 1279, 'HAWCSim.Evt.inNum': 1279, 'HAWCSim.Evt.nEMParts': 31,....}
maxnoe commented 2 years ago

See #93

jimbraun commented 2 years ago

I'm happy to merge improvements that modernize the pybindings. If the Python interface changes, I would also seek approval from HAWC (probably from Andy Smith).

Thanks for your contributions!

HealthyPear commented 2 years ago

Awesome!

Andy is one of my working group coordinators, so I can talk to him. Though the "SWGO" software is using the same release that HAWC uses which is the last one you made many years ago (3.00.03), so I doubt they need more than that given that HAWC has now many years of continuous operation...

I think that (if needed) the python2 bindings could be added back in parallel and enable them in place of the Python3 ones with a CMake option @maxnoe, right?

HealthyPear commented 2 years ago

@jimbraun if it's easy for you could you summarize here what were the python2 bindings capabilities (API) so we can compare and perhaps write some documentation to explain the differences?

jimbraun commented 2 years ago

Looking at the history, I had forgotten that Hugo added support for Python3 up to some version. I guess you're running a later version of Python3 and we stumbled onto an incompatibility.

The needed functionality of the XCDFFile object can be read from the old pybindings:

static PyGetSetDef XCDFFile_getseters[] =
{
  { const_cast<char*>("filename"), (getter)XCDFFile_getfilename, NULL,
    const_cast<char*>("XCDF file name"),
    NULL },

  { const_cast<char*>("count"), (getter)XCDFFile_getcount, NULL,
    const_cast<char*>("XCDF record count"),
    NULL },

  { const_cast<char*>("nfields"), (getter)XCDFFile_getnfields, NULL,
    const_cast<char*>("Number of fields per record"),
    NULL },

  { NULL }
};

static PyMethodDef XCDFFile_methods[] =
{
  { const_cast<char*>("header"), (PyCFunction)XCDFFile_header,
    METH_NOARGS,
    const_cast<char*>("Print XCDF file field data") },

  { const_cast<char*>("getRecord"), (PyCFunction)XCDFFile_getRecord,
    METH_VARARGS,
    const_cast<char*>("Get a record by number from the file") },

  { const_cast<char*>("records"), (PyCFunction)XCDFRecord_iterator,
    METH_KEYWORDS,
    const_cast<char*>("Iterator over XCDF records") },

  { const_cast<char*>("fields"), (PyCFunction)XCDFField_iterator,
    METH_O,
    const_cast<char*>("Iterator over one or more XCDF fields (comma-separated "
                      "by name)") },

  { const_cast<char*>("addField"), (PyCFunction)XCDFFile_addField,
    METH_VARARGS,
    const_cast<char*>("Add a field with a given name, XCDF type, and optional "
                      "resolution") },

  { NULL }
};

static PyMemberDef XCDFFile_members[] =
{
  { const_cast<char*>("filename"), T_OBJECT_EX,
    offsetof(pyxcdf_XCDFFile, filename_), 0,
    const_cast<char*>("XCDF file name") },

  { NULL }
};

The new functions in #93 look to be missing some of this.

maxnoe commented 2 years ago

The new functions in https://github.com/jimbraun/XCDF/pull/93 look to be missing some of this.

Essentially only the random access (which I'd implement using the pythonic __getitem__, not as a getRecord method). The rest should be there.

Also, the AddField method is connected to #79, right? You need also the other methods to support writing.

jimbraun commented 2 years ago

Thanks, I agree after a more careful look at #93. I also feel the new interface is more Python-like and an improvement.

HealthyPear commented 1 year ago

@jimbraun I think we can close this issue now that #93 has been merged

maxnoe commented 1 year ago

You could also easily add a conda-forge package now

HealthyPear commented 1 year ago

You could also easily add a conda-forge package now

yes I plan to do this soon (and PyPI package) - we have also already #89 that tracks this