scottrogowski / code2flow

Pretty good call graphs for dynamic languages
MIT License
3.98k stars 295 forks source link

Functions with module prefix are not mapped (Python) #104

Open lee-sony opened 1 week ago

lee-sony commented 1 week ago

Background

Given a typical application based on the repository pattern, I want to map function calls to make sure that data access is properly controlled.

I have a setup a project as such

.
├── infrastructure
│   └── http.py
├── main.py
└── repository
    └── user_repository.py

With the following source code (for minimal reproduction)

# infrastructure/http.py
from typing import Optional

def request(
  method: str, url: str, headers: Optional[dict] = None, body: Optional[dict] = None
) -> Optional[dict]:
  # Make an HTTP request
  pass

# repository/user_repository.py
from infrastructure import http
from infrastructure.http import request

def get_users():
  http.request("GET", "https://api.example.com/users")

def get_users2():
  request("GET", "https://api.example.com/users")

# main.py
import repository.user_repository as user_repository
from repository.user_repository import get_users2

def main():
  user_repository.get_users()
  get_users2()

main()

Logs

Code2Flow: Found 3 files from sources argument.
Code2Flow: Implicitly detected language as 'py'.
Code2Flow: Processing 3 source file(s).
Code2Flow:   ./infrastructure/http.py
Code2Flow:   ./main.py
Code2Flow:   ./repository/user_repository.py
Code2Flow: Found groups ['File: http', 'File: main', 'File: user_repository'].
Code2Flow: Found nodes ['(global)', '(global)', '(global)', 'get_users', 'get_users2', 'main', 'request'].
Code2Flow: Found calls ['get_users2()', 'http.request()', 'main()', 'request()', 'user_repository.get_users()'].
Code2Flow: Found variables ['Optional->UNKNOWN_MODULE', 'get_users2->UNKNOWN_MODULE', 'http->UNKNOWN_MODULE', 'request->UNKNOWN_MODULE', 'user_repository->UNKNOWN_MODULE'].
Code2Flow: Generating output file...
Code2Flow: Wrote output file 'out.gv' with 4 nodes and 3 edges.
Code2Flow: For better machine readability, you can also try outputting in a json format.
Code2Flow: Code2flow finished processing in 0.00 seconds.

Expected behavior

When I run code2flow ./**/*.py, I expect to see mappings for all function calls for something like this: image

Actual behavior

However, I can only seem to get mappings that are imported directly: image

Is this expected behavior? Or am I missing something here?

patfinder commented 1 week ago

Not sure if you want to try another tool. Antlr provide a generic solution. Can support countless number of language (with grammar files included). Also worked with dot (graphviz) generation. The only concern is it written by Java. But you can read 10 minutes tutorial and can compile and run.

I was working in this (code2flow) project but thinking again now.

patfinder commented 1 week ago

The project github: https://github.com/antlr/antlr4 There is a repo for grammar. Look at the owner repos for that one.

lee-sony commented 1 week ago

@patfinder Antlr looks a little too generic for my usecase 😅 I don't think a parser builder is what I'm looking for.