vnmabus / rdata

Reader of R datasets in .rda format, in Python
https://rdata.readthedocs.io
MIT License
40 stars 2 forks source link

ValueError: 249 is not a valid RObjectType #41

Open pulpdood opened 2 weeks ago

pulpdood commented 2 weeks ago

Bug description summary

Hi, this project is really cool!

I was just working on migrating R code written by a member of my team to Python, and there are some RDS files containing models which were generated by R. I'm trying to use rdata to parse the RDS file, but I get the following error:

ValueError: 249 is not a valid RObjectType

Please forgive me I am not aware exactly what the RDS file contains, and I don't think I can disclose it. Thank you so much!

Code to reproduce the bug

No response

Data file(s)

No response

Expected result

Valid python object I can use

Actual result

Got an error

Traceback (if an exception is raised)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/**redacted**/lib/python3.12/site-packages/rdata/parser/_parser.py", line 1087, in parse_file
    return parse_data(
           ^^^^^^^^^^^
  File "/**redacted**/lib/python3.12/site-packages/rdata/parser/_parser.py", line 1222, in parse_data
    return parse_function(
           ^^^^^^^^^^^^^^^
  File "/**redacted**/lib/python3.12/site-packages/rdata/parser/_parser.py", line 1222, in parse_data
    return parse_function(
           ^^^^^^^^^^^^^^^
  File "/**redacted**/lib/python3.12/site-packages/rdata/parser/_parser.py", line 1258, in parse_rdata_binary
    r_data = parser.parse_all()
             ^^^^^^^^^^^^^^^^^^
  File "/**redacted**/lib/python3.12/site-packages/rdata/parser/_parser.py", line 610, in parse_all
    obj = self.parse_R_object()
          ^^^^^^^^^^^^^^^^^^^^^
  File "/**redacted**/lib/python3.12/site-packages/rdata/parser/_parser.py", line 862, in parse_R_object
    value[i] = self.parse_R_object(
               ^^^^^^^^^^^^^^^^^^^^
  File "/**redacted**/lib/python3.12/site-packages/rdata/parser/_parser.py", line 862, in parse_R_object
    value[i] = self.parse_R_object(
               ^^^^^^^^^^^^^^^^^^^^
  File "/**redacted**/lib/python3.12/site-packages/rdata/parser/_parser.py", line 775, in parse_R_object
    tag = self.parse_R_object(reference_list, bytecode_rep_list)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/**redacted**/lib/python3.12/site-packages/rdata/parser/_parser.py", line 725, in parse_R_object
    info = parse_r_object_info(info_int)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/**redacted**/lib/python3.12/site-packages/rdata/parser/_parser.py", line 1282, in parse_r_object_info
    type_exp = RObjectType(bits(info_int, 0, 8))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/enum.py", line 757, in __call__
    return cls.__new__(cls, value)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/enum.py", line 1171, in __new__
    raise ve_exc
ValueError: 249 is not a valid RObjectType

Software versions

rdata version: 0.11.2 OS: macOS Sonoma 14.2.1

Additional context

No response

vnmabus commented 2 weeks ago

249 is the code for an R namespace, which is not currently supported. Adding it to the file was probably unintentional.

Implementing enough support to be able to load the dataset should not be difficult to do. In principle we should add the new enum for namespace and the code for the parser and converter to read the namespace name (I think that is all the info that is written) and return a special class containing it. Of course, the namespace itself would not be available in Python.

I am currently a bit busy, but if you want you can propose a PR. Otherwise I will try to do it when I have more time.