sourcegraph / scip

SCIP Code Intelligence Protocol
Apache License 2.0
262 stars 31 forks source link

Clarification on Extracting Call Relationships in Python Projects Using SCIP-Python #263

Closed HarryCollins2 closed 3 months ago

HarryCollins2 commented 3 months ago

Hi there,

I hope this message finds you well.

I am currently working on parsing and extracting the call relationships within a Python project using SCIP-Python. I have a couple of questions regarding this:

1.  Does the SCIP file contain the call relationships for the functions in the project?
2.  If so, could you please guide me on how to extract these call relationships and find the call trees for each function?

I appreciate your help and look forward to your response.

Thank you!

sansmoraxz commented 3 months ago

I kinda raised a similar question in #259 FYI

varungandhi-src commented 3 months ago

Does the SCIP file contain the call relationships for the functions in the project?

Approximately, not exactly. The following will be represented in the same way:

    x = myfun # unqualified reference to myfun
    myfun(0) # function call

Both of these will have a single Occurrence for myfun and cannot be distinguished using SCIP data alone without some additional heuristics.

If so, could you please guide me on how to extract these call relationships and find the call trees for each function?

Minor correction: call hierarchies can be graphs, not just trees, due to the presence of recursion.

The first thing would be to iterate over all function definition occurrences (iterate over occurrences, check the symbol_roles bitmask for it being a definition, check if the symbol format matches that for a function/method) and record the enclosing_range values in an Interval tree.

Next, iterate over all function reference occurrences, and identify the node in the interval tree encloses the source range for the reference occurrence. This gives one edge in the 'Call hierarchy' graph (the nodes being the symbol names). Do this for all occurrences and all files, and you'll have the full call graph.

I encourage you to generate the SCIP index for a small file/project, and pretty-print it using scip print --json to see the raw data, I think my description will make more sense after that if you haven't seen the raw data yet.