I tried few different python related ORC packages and found pyorc is best. I was testing pyorc for both positive scenario and negative scenario listed from apache orc project. For negative scenario files I was expecting appropriate exception message but pyorc causes segmentation fault, Could you please look into this?
import pyorc
with open("./missing_blob_stream_in_string_dict.orc", "rb") as data:
reader = pyorc.Reader(data)
for row in reader:
print(row)
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
These are the below files which causes segmentation fault when read by pyorc. Some details taken from apache ORC project which may help:
It looks like all of these fixes will be in the upcoming 1.7.0 version of the ORC C++ Core.
(Even the ORC-589, where the ticket says it's fixed in 1.6.3, but it's not patched in the latest 1.6.5)
Hi,
I tried few different python related ORC packages and found pyorc is best. I was testing pyorc for both positive scenario and negative scenario listed from apache orc project. For negative scenario files I was expecting appropriate exception message but pyorc causes segmentation fault, Could you please look into this?
These are the below files which causes segmentation fault when read by pyorc. Some details taken from apache ORC project which may help:
missing_blob_stream_in_string_dict.orc => ORC-591 => [C++] Check missing blob stream for StringDictionaryColumnRe…
missing_length_stream_in_string_dict.orc => ORC-590 => [C++] added check for missing LENGTH stream in StringDiction…
negative_dict_entry_lengths.orc => ORC-589 => [C++] add checks about negative dictionary entry lengths
stripe_footer_bad_column_encodings.orc => ORC-580 => [C++] Verify ColumnEncodings in StripeFooter (#463)