noirello / pyorc

Python module for Apache ORC file format
Apache License 2.0
64 stars 20 forks source link

pyorc.errors.ParseError: Footer is corrupt: types(1701470799) not exists #36

Closed fehtemam closed 3 years ago

fehtemam commented 3 years ago

When using pyorc with tensorflow (i.e., importing both in the same script) I get the footer is corrupt error. My investigations got me to the issue with different versions of protobuf. Tensorflow and some other packages rely on newer versions of protobuf and this causes the crash. Has anyone had this problem? Is there any workaround for this type of issue?

fehtemam commented 3 years ago

So I didn't pay attention to details at first unfortunately. My teammate suggested we compile pyorc from source using a different version of protobuf and I went back to see how that might work and I saw you documented this right at the top of the installation page! It seems like our problem stems from the libprotobuf library you mention there. I'm not a software engineer and have never compiled anything (just coding in python) but I'll look to see if we can make it work.

noirello commented 3 years ago

I've never tried to compile the C++ core with other version of protobuf (or any other dependency of ORC), than the default, so I don't have the experience. My guess would be that you only need to change this line to your preferred version. So you should:

  1. Get the source code of PyORC
  2. Call python3 setup.py build_orc --download-only=True (you might need to install pybind11 first)
  3. Change the line at deps/orc-1.6.6/cmake_modules/ThirdpartyToolchain.cmake
  4. Follow the instructions to build the module from source