r2e-project / r2e

r2e: turn any github repository into a programming agent environment
https://r2e.dev
MIT License
87 stars 10 forks source link

PyCG and Repo building fails due to the use of "|" as a separator #10

Closed manishshettym closed 4 months ago

manishshettym commented 4 months ago

Describe the bug setup_repos.py uses "|" as a separator between repo_org and repo_name when cloning new repos. This when passed to PyCG creates an error due to malformed import statements during transformations.

To Reproduce Steps to reproduce the behavior:

  1. python r2e/repo_builder/setup_repos.py --repo_url <any repo url>
Cloning repository https://github.com/AtsushiSakai/PythonRobotics to /Users/manishs/buckets/local_repoeval_bucket/repos/AtsushiSakai|PythonRobotics
Running pycg on 1 repos
Error in file: /Users/manishs/buckets/local_repoeval_bucket/repos/AtsushiSakai|PythonRobotics_temp/PathPlanning/RRT/sobol/__init__.py
invalid syntax (<unknown>, line 1)
  0%|                                                                          | 0/1 [00:00<?, ?it/s]Failed to run pycg on repo_org='AtsushiSakai' repo_name='PythonRobotics' repo_id='AtsushiSakai|PythonRobotics' local_repo_path='AtsushiSakai|PythonRobotics': pebble.common.RemoteTraceback: Traceback (most recent call last):
  File "/Users/manishs/Projects/r2e/.venv/lib/python3.11/site-packages/pebble/common.py", line 183, in process_execute
    return Result(SUCCESS, function(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/manishs/Projects/r2e/r2e/repo_builder/run_pycg.py", line 13, in construct_pycg
    cgraph = CallGraphGenerator.construct_call_graph(repo.repo_path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/manishs/Projects/r2e/r2e/pat/callgraph/generator.py", line 32, in construct_call_graph
    repo_path = ImportTransformer.transform_repo(repo_path)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/manishs/Projects/r2e/r2e/pat/imports/transformer.py", line 89, in transform_repo
    raise e
  File "/Users/manishs/Projects/r2e/r2e/pat/imports/transformer.py", line 85, in transform_repo
    ImportTransformer.transform_file(os.path.join(root, file))
  File "/Users/manishs/Projects/r2e/r2e/pat/imports/transformer.py", line 69, in transform_file
    ast.parse(source_code)
  File "/opt/homebrew/Cellar/python@3.11/3.11.8/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<unknown>", line 1
    from AtsushiSakai|PythonRobotics_temp.PathPlanning.RRT.sobol.sobol import i4_sobol as sobol_quasirand
                     ^
SyntaxError: invalid syntax
manishshettym commented 4 months ago

Solution: Move to something like "___" (triple underscore) which is valid for bash scripts and import statements, and rare enough to be used as a separator.

Naman-ntc commented 4 months ago
from AtsushiSakai|PythonRobotics_temp.PathPlanning.RRT.sobol.sobol import i4_sobol as sobol_quasirand

wait why is the import like this? it seems to be issue of import resolver?

accidental pipe is maybe a possible ig

manishshettym commented 4 months ago

@Naman-ntc: If we rename the repo (irrespective of the separator) and the root package directory is the root of the repo, then import transformations (relative to absolute) will (and should) show up like this.

E.g., of such a repo (where __init__.py is at topmost level) is https://github.com/AtsushiSakai/PythonRobotics

Naman-ntc commented 4 months ago

Hmm, we set the python path to be inside the repository folder right? So, i don't see why this is needed?

I would imagine that the __init__ at the topmost level is useful for using the package externally but within the repository using the repository folder name doesn't make sense (and original code skirts this by using relative paths ig)

manishshettym commented 4 months ago

So, i don't see why this is needed?

Hmm, not sure I understand. Do you mean we don't need to do relative->abs transformation? IIRC, it is for PyCG to not have issues finding stuff, right. Think removing that would be a breaking change for other repos(?)

If you mean this repo (PythonRobotics) is weird because it has a init.py under the repo's root (unlike the standard /src folder format for a package) -- I agree! and yes, that's the reason for this issue.

Side note: IMO, most reliable way to do rel->abs import transformation, is to use consistent imports from the root of the package -- which in this case happens to be the repo folder.