wala / ML

Eclipse Public License 2.0
23 stars 17 forks source link

No notion of PYTHONPATH in analysis #163

Closed khatchad closed 3 months ago

khatchad commented 3 months ago

Other static analysis tools of Python, e.g., PyDev, have a notion of "PYTHONPATH," e.g, source folders. I believe that Ariadne only considers the script directory as the PYTHONPATH, but, in fact, it might be elsewhere. In PyDev, the developer sets this. Maybe Ariadne should also have this as input.

Example

Consider the following directory structure:

src/
    __init__.py
    A.py
    B.py

Currently, if, for example, in B.py, we have from A import X, that's valid in both Python and resolves correctly in Ariadne. However, if, for example, in B.py, we have from src.A import X, if we run Python from the parent directory of src and set PYTHONPATH=., I believe that's valid in Python. However, Ariadne can't resolve this import probably because there's no src/src directory.

khatchad commented 3 months ago

Need to setup some examples (test cases) here.

khatchad commented 3 months ago

It seems like the default PYTHONPATH is the script directory, which is also the case in Python. However, while Ariadne can find scripts in the same directory, it cannot find modules in the same directory. Thus, no modules are found even without considering a custom PYTHONPATH.

khatchad commented 3 months ago

Potentially helpful link:

https://docs.python.org/3/tutorial/modules.html#the-module-search-path

khatchad commented 3 months ago

@msridhar Looking at the input to WALA, it seems to take a classpath to the analysis engine. In Ariadne's case, it takes a sequence of "scripts." I think the a good part of the problem here is that there is no "project root" input to Ariadne. If I had such an input. I could construct the IR with relative directory names. Currently, if there are two scripts say /root/A.py and /root/in/B.y, in the IR, we see only A.py and B.py. In other words, there is no record of the directory structure, which Python relies upon for packages.

If, on the other hand, /root in the above example was provided as a "project root" path, then, we could construct the IR to use A.py and in/B.py. We would know that /root is the starting point for the input project and use relative directories from there on out. This would fix the problem, because, for A.py to use B.py as a module, it would need to include import in.B or from in.B import X (where X is some importable entity in B.py).

I believe this would also solve https://github.com/wala/ML/issues/162 as we would incorporate relative directory names and thus each script would be uniquely identifiable. But, this seems different than what is done in WALA. AFAIK, WALA does not have a notion of a root project directory of the analyzed code.

So, I suppose this is kind of a design decision. Is there a particular acceptable solution? Does it make sense to have a project root?

khatchad commented 3 months ago

I suppose an alternative to a "project root" would be a PYTHONPATH that, like a classpath, is a sequence of locations. We could search them in order and if we find the file there we use that in the IR.

khatchad commented 3 months ago

I'm also unsure what the design decision was regarding feeding the analysis engine each individual file. In WALA, the entire classpath is scanned. Is each file entered in the JavaScript case? If we go with the PYTHONPATH option, I am a little reluctant to make a large change where we replace individual input files with a path sequence. In that case, perhaps the PYTHONPATH would be passed separately to the analysis engine ...