Closed rasenmaeher92 closed 2 years ago
First off, thanks for such an interesting tool, and apologies for opening an issue for yet another data science related library 🙈.
Please keep them coming! Seriously, these are really helpful.
This time I tried to
import pandas
, which yields this output [...]
We do some fancy stuff with the datetime module, which seems like the problem here. I have yet to investigate in detail, but will do that in the next few days and report back about what our solutions might be.
As these data science related libraries often feature C-code to accelerate them, the plugin feature might be related to my endeavors, and I was wondering whether there are any "official" plugins, which already ship with crosshair?
CrossHair ships with several modules that are almost plugins, but not quite. Here is where we load plugins, and directly above this, you'll see several calls to functions named make_registrations
, each one corresponding to a standard library module. The body of each of these functions is equivalent to the body of a plugin module. For example, the plugin for the standard library's "collections" module, here, has pure-Python implementations of defaultdict and deque.
I think CrossHair could be quite useful for some data science work; e.g. help with matrix shapes. I don't feel quite qualified / experienced enough to make a complete plugin, but I have fiddled around with some ideas for symbolic numpy arrays here. If you happened to be interested in working on a real plugin for numpy/pandas/pytorch, I'd be more than happy to help!
Diagnosis: CrossHair swaps out the system's C-based datetime module for a pure Python version (also that interestingly ships with CPython, but normally gets overridden by the C version). Pandas uses Cython to extend datetime classes, though, and I guess we cannot make the Cython class extend a regular Python class.
Proposed solution: Instead of destructively swapping out the system's datetime module, we can use CrossHair's usual register_type/register_patch machinery to dynamically swap in symbolic datetime classes. This is a bit of work, but reducing the amount of destructive things CrossHair does to the interpreter is also good. Note that this solution doesn't help CrossHair analyze code using Pandas or even dates in Pandas, it just avoids the error at import time. A real plugin would need to be developed to enable CrossHair to do anything useful with code that uses Pandas.
It might be a week or two before I complete this change, but it is in-progress.
An update!: pandas should be import-able as of 9e735ca9d14186592f97b147469532ef4a3012e1. I'll update this bug again when I cut a release including it.
The fix for this was shipped in v0.0.23. Thanks again for the detailed report!
Expected vs actual behavior First off, thanks for such an interesting tool, and apologies for opening an issue for yet another data science related library 🙈.
This time I tried to
import pandas
, which yields this output.To Reproduce Install pandas, e.g., with conda:
Create a file cross_hair_test.py with the following content:
Run CrossHair:
As these data science related libraries often feature C-code to accelerate them, the plugin feature might be related to my endeavors, and I was wondering whether there are any "official" plugins, which already ship with crosshair?