pschanely / CrossHair

An analysis tool for Python that blurs the line between testing and type systems.
Other
1.03k stars 49 forks source link

Importing pandas raises Errors and RuntimeWarnings #159

Closed rasenmaeher92 closed 2 years ago

rasenmaeher92 commented 2 years ago

Expected vs actual behavior First off, thanks for such an interesting tool, and apologies for opening an issue for yet another data science related library 🙈.

This time I tried to import pandas, which yields this output.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I couldn't import a file.
/mnt/miniconda3/envs/myenv/lib/python3.9/site-packages/crosshair/util.py:328:
|    try:
|        root_path, module_name = extract_module_from_file(filename)
|        with add_to_pypath(root_path):
>            return import_module(module_name)
|    except Exception as e:
|        raise ErrorDuringImport from e
|

type 'pandas._libs.tslibs.base.ABCTimestamp' is not dynamically allocated but its base type 'datetime' is dynamically allocated
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Analyzed 0 paths in "cross_hair_test.py".       <frozen importlib._bootstrap>:228: RuntimeWarning: datetime.date size changed, may indicate binary incompatibility. Expected 32 from C header, got 48 from PyObject
<frozen importlib._bootstrap>:228: RuntimeWarning: datetime.time size changed, may indicate binary incompatibility. Expected 40 from C header, got 72 from PyObject
<frozen importlib._bootstrap>:228: RuntimeWarning: datetime.datetime size changed, may indicate binary incompatibility. Expected 48 from C header, got 136 from PyObject

To Reproduce Install pandas, e.g., with conda:

conda install pandas

Create a file cross_hair_test.py with the following content:

import pandas as pd

Run CrossHair:

python -m crosshair watch cross_hair_test.py

As these data science related libraries often feature C-code to accelerate them, the plugin feature might be related to my endeavors, and I was wondering whether there are any "official" plugins, which already ship with crosshair?

pschanely commented 2 years ago

First off, thanks for such an interesting tool, and apologies for opening an issue for yet another data science related library 🙈.

Please keep them coming! Seriously, these are really helpful.

This time I tried to import pandas, which yields this output [...]

We do some fancy stuff with the datetime module, which seems like the problem here. I have yet to investigate in detail, but will do that in the next few days and report back about what our solutions might be.

As these data science related libraries often feature C-code to accelerate them, the plugin feature might be related to my endeavors, and I was wondering whether there are any "official" plugins, which already ship with crosshair?

CrossHair ships with several modules that are almost plugins, but not quite. Here is where we load plugins, and directly above this, you'll see several calls to functions named make_registrations, each one corresponding to a standard library module. The body of each of these functions is equivalent to the body of a plugin module. For example, the plugin for the standard library's "collections" module, here, has pure-Python implementations of defaultdict and deque.

I think CrossHair could be quite useful for some data science work; e.g. help with matrix shapes. I don't feel quite qualified / experienced enough to make a complete plugin, but I have fiddled around with some ideas for symbolic numpy arrays here. If you happened to be interested in working on a real plugin for numpy/pandas/pytorch, I'd be more than happy to help!

pschanely commented 2 years ago

Diagnosis: CrossHair swaps out the system's C-based datetime module for a pure Python version (also that interestingly ships with CPython, but normally gets overridden by the C version). Pandas uses Cython to extend datetime classes, though, and I guess we cannot make the Cython class extend a regular Python class.

Proposed solution: Instead of destructively swapping out the system's datetime module, we can use CrossHair's usual register_type/register_patch machinery to dynamically swap in symbolic datetime classes. This is a bit of work, but reducing the amount of destructive things CrossHair does to the interpreter is also good. Note that this solution doesn't help CrossHair analyze code using Pandas or even dates in Pandas, it just avoids the error at import time. A real plugin would need to be developed to enable CrossHair to do anything useful with code that uses Pandas.

It might be a week or two before I complete this change, but it is in-progress.

pschanely commented 2 years ago

An update!: pandas should be import-able as of 9e735ca9d14186592f97b147469532ef4a3012e1. I'll update this bug again when I cut a release including it.

pschanely commented 2 years ago

The fix for this was shipped in v0.0.23. Thanks again for the detailed report!