vnmabus / rdata

Reader of R datasets in .rda format, in Python
https://rdata.readthedocs.io
MIT License
40 stars 2 forks source link

Length of CHAR can not be 0 #1

Closed deeenes closed 3 years ago

deeenes commented 3 years ago

Hi @vnmabus ,

Thank you for this great package. I wanted to process this file: https://raw.githubusercontent.com/sqjin/CellChat/master/data/CellChatDB.human.rda and I was really desperate because pyreadr is unable to process lists when I found your package. However I got a NotImplementedError: Length of CHAR can not be 0. Then I went here https://github.com/vnmabus/rdata/blob/72d2b853ec15156d54a084e75603932376c26900/rdata/parser/_parser.py#L312 and changed this part to be:

           if length > 0:
                value = self.parse_string(length=length)
            else:
                value = b""

This way the processing went just fine. I am wondering if this might have any adverse effect? Otherwise you could maybe update or I can create a pull request.

Best,

Denes

vnmabus commented 3 years ago

In fact, I am not even sure why I treat the -1 length case as the empty string instead of the 0 length case. Maybe that was the value that I obtained while reverse engineering the format?

I think that it is necessary to add a test for the empty string, and that the -1 should be replaced by 0. If -1 length is something that arises in real data, then there need to be TWO tests, one for -1 and one for 0. I will also test the change with my real datasets just in case.

If you want to take this and submit a PR, I will happily review it. If you do not like to do that, tell me and I will do it myself.

deeenes commented 3 years ago

Thanks for fixing this, I wanted to return to it, past weeks were just too busy

vnmabus commented 3 years ago

Thank you for reporting the bug. I assumed you were not interested in submitting a PR, so I fixed it. Sorry if you wanted to do it yourself 😅.