vnmabus / rdata

Reader of R datasets in .rda format, in Python
https://rdata.readthedocs.io
MIT License
45 stars 2 forks source link

ValueError: 238 is not a valid RObjectType #11

Closed schlegelp closed 3 years ago

schlegelp commented 3 years ago

Hi! First off: very cool package - definitely a big help for people (such as myself) who are working with both R and Python.

I'm trying to parse an R object that is effectively a collection of named lists, data.frames and other simple attributes, and I'm running into this error:

>>> import data
>>> parsed = rdata.parser.parse_file('/Users/philipps/Downloads/DA1_test.rda')                                                                                       
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
ValueError: 238 is not a valid RObjectType

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-193-c9fd7e0869a5> in <module>
----> 1 parsed = rdata.parser.parse_file('/Users/philipps/Downloads/DA1_test.rda')

~/.pyenv/versions/3.7.5/lib/python3.7/site-packages/rdata/parser/_parser.py in parse_file(file_or_path)
    612             binary_file = buffer
    613         data = binary_file.read()
--> 614     return parse_data(data)
    615 
    616 

~/.pyenv/versions/3.7.5/lib/python3.7/site-packages/rdata/parser/_parser.py in parse_data(data)
    698         return parse_data(bz2.decompress(data))
    699     elif filetype is FileTypes.gzip:
--> 700         return parse_data(gzip.decompress(data))
    701     elif filetype is FileTypes.xz:
    702         return parse_data(lzma.decompress(data))

~/.pyenv/versions/3.7.5/lib/python3.7/site-packages/rdata/parser/_parser.py in parse_data(data)
    703     elif filetype in {FileTypes.rdata_binary_v2, FileTypes.rdata_binary_v3}:
    704         view = view[len(magic_dict[filetype]):]
--> 705         return parse_rdata_binary(view)
    706     else:
    707         raise NotImplementedError("Unknown file type")

~/.pyenv/versions/3.7.5/lib/python3.7/site-packages/rdata/parser/_parser.py in parse_rdata_binary(data)
    719     if format_type is RdataFormats.XDR:
    720         parser = ParserXDR(data)
--> 721         return parser.parse_all()
    722     else:
    723         raise NotImplementedError("Unknown file format")

~/.pyenv/versions/3.7.5/lib/python3.7/site-packages/rdata/parser/_parser.py in parse_all(self)
    285         versions = self.parse_versions()
    286         extra_info = self.parse_extra_info(versions)
--> 287         obj = self.parse_R_object()
    288 
    289         return RData(versions, extra_info, obj)

~/.pyenv/versions/3.7.5/lib/python3.7/site-packages/rdata/parser/_parser.py in parse_R_object(self, reference_list)
    366 
    367             # Read CAR and CDR
--> 368             car = self.parse_R_object(reference_list)
    369             cdr = self.parse_R_object(reference_list)
    370             value = (car, cdr)

~/.pyenv/versions/3.7.5/lib/python3.7/site-packages/rdata/parser/_parser.py in parse_R_object(self, reference_list)
    445 
    446             for i in range(length):
--> 447                 value[i] = self.parse_R_object(reference_list)
    448 
    449         elif info.type == RObjectType.S4:

~/.pyenv/versions/3.7.5/lib/python3.7/site-packages/rdata/parser/_parser.py in parse_R_object(self, reference_list)
    334         info_int = self.parse_int()
    335 
--> 336         info = parse_r_object_info(info_int)
    337 
    338         tag = None

~/.pyenv/versions/3.7.5/lib/python3.7/site-packages/rdata/parser/_parser.py in parse_r_object_info(info_int)
    747     Parse the internal information of an object.
    748     """
--> 749     type_exp = RObjectType(bits(info_int, 0, 8))
    750 
    751     reference = 0

~/.pyenv/versions/3.7.5/lib/python3.7/enum.py in __call__(cls, value, names, module, qualname, type, start)
    308         """
    309         if names is None:  # simple value lookup
--> 310             return cls.__new__(cls, value)
    311         # otherwise, functional API: we're creating a new Enum type
    312         return cls._create_(value, names, module=module, qualname=qualname, type=type, start=start)

~/.pyenv/versions/3.7.5/lib/python3.7/enum.py in __new__(cls, value)
    562                         )
    563             exc.__context__ = ve_exc
--> 564             raise exc
    565 
    566     def _generate_next_value_(name, start, count, last_values):

~/.pyenv/versions/3.7.5/lib/python3.7/enum.py in __new__(cls, value)
    546         try:
    547             exc = None
--> 548             result = cls._missing_(value)
    549         except Exception as e:
    550             exc = e

~/.pyenv/versions/3.7.5/lib/python3.7/enum.py in _missing_(cls, value)
    575     @classmethod
    576     def _missing_(cls, value):
--> 577         raise ValueError("%r is not a valid %s" % (value, cls.__name__))
    578 
    579     def __repr__(self):

ValueError: 238 is not a valid RObjectType

I hope that's a fixable issue but in case it's not, I was wondering if it was possible to implement something that let's you simply skip errors and return only what could be parsed?

This is with Python 3.7.5 and rdata 0.4 on OSX 10.14.6.

Also in case that's helpful for debugging, this is the culprit: DA1_test.rda.zip

schlegelp commented 3 years ago

An immediate update that might help: if I save the file in format version 2 (instead of 3), it is parsed just fine.

vnmabus commented 3 years ago

It seems that 238 is the code for ALTREP representations (https://blog.revolutionanalytics.com/2017/09/altrep-preview.html) in version 3 format. Those are not used in version 2, and that is why it worked if you save it into that version. I could try to have a look on how that framework works, and try to translate the default ALTREPs, and maybe allow users to provide callbacks for custom ones.

schlegelp commented 3 years ago

That would obviously be my preferred solution :)

If that's not possible for some reason, I would perhaps suggest a more verbose error message. Something along the lines of "ALTREP representations not (yet) supported. Try re-saving in format 2."

vnmabus commented 3 years ago

I have tried your dataset with the feature/altrep_support branch and it seemed to work. Can you confirm that it also works for you?

schlegelp commented 3 years ago

Neat, that works! Thanks a bunch! Do you have an ETA for this to make it to PyPI? Just asking because I'd like to use it as dependency in my package?

vnmabus commented 3 years ago

I wanted to merge the PR and upload it.

vnmabus commented 3 years ago

Uploaded now. Altrep is supported in version 0.5.