Closed theOehrly closed 4 months ago
Python interpreter crashes in Goolge Colab (and apparently produces a segmentation fault in other environments), it simply hangs indefinitely on Windows. Or at least more than a few minutes. After that, I killed the process.......When something like this happens Disabling buckaroo is the best option.It's mentioned in the documentation. To run buckaroo in Google Colab it needs special initiation code.....Ref to the following https://colab.research.google.com/github/paddymul/buckaroo/blob/main/example-notebooks/Full-tour.ipynb
When something like this happens Disabling buckaroo is the best option.It's mentioned in the documentation.
I don't really care about Buckaroo, I don't use it. I'm just responding to an issue that was opened against my project by the maintainer of buckaroo. It turns out the problem is not caused by my project, though. Having already done the debugging work and having more or less isolated the problem, I decided to open this issue here.
To run buckaroo in Google Colab it needs special initiation code.....Ref to the following https://colab.research.google.com/github/paddymul/buckaroo/blob/main/example-notebooks/Full-tour.ipynb
Right, I missed that. It doesn't matter, but I updated the example accordingly. As expected, the problem persists. It happens during the internal serialization of the pandas object to JSON. The actual crash seems to be inside json.dumps
.
@theOehrly Thanks for the bug report! I hadn't narrowed it down to TimeStamp
. My strong suspicion is that this is an upstream bug in pandas. Nothing that Buckaroo does should cause a segfault.
I will close this bug once I have a workaround released for buckaroo. I will also file a bug with pandas, but expect that to take longer.
@paddymul calling .to_json()
directly on the original DataFrame works fine.
But the code in pandas.io
receives an object that seems like it was modified by buckaroo (or something else in the chain here). At least the repr contained additional info that looked like it was related to buckaroo. Then the seg fault occured in json.dumps
when this object was passed to it.
I stopped investigating there, but I wouldn't rule out buckaroo completely yet.
I filed a bug against pandas here https://github.com/pandas-dev/pandas/issues/58160
This exists in pandas 2.0.3
and is fixed in pandas 2.1.0
. I currently support pandas back to 1.3.5
, so I will try to find an earlier version of pandas where this does work, and adjust the project requirements accordingly
fixed with 0.6.12
Checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of Buckaroo.
What type of jupyter notebook were you using (VSCode notebook, google colab, Jupyter Lab, Jupyter notebook). Select multiple if you can reproduce this in multiple environments. If other, please add to description.
Google Colab, Jupyter Notebook
Reproducible example
Issue description
When a
pandas.DataFrame
includes values of typepandas.Timestamp
, this results in a crash of the Python interpreter when the DataFrame is visualized in Jupyter using Buckaroo.The problem is reproducible in at least:
For completeness, while the Python interpreter crashes in Goolge Colab (and apparently produces a segmentation fault in other environments), it simply hangs indefinitely on Windows. Or at least more than a few minutes. After that, I killed the process.
Here is a sample Google Colab notebook that reproduces the crash: https://colab.research.google.com/drive/1KgW5a_Ufw1np3RrueS8o11jYM_7rqC3Z?usp=sharing
This issue was originally reported by @paddymul in https://github.com/theOehrly/Fast-F1/issues/565
Expected behavior
The interpreter should not crash. The DataFrame should be visualized correctly.
Installed versions
Jupyter Log output
No response