sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.11k stars 1.29k forks source link

Provide an auto-indexing Lua script for Python #56336

Closed varungandhi-src closed 1 year ago

varungandhi-src commented 1 year ago

For auto-indexing, the default configuration for most languages (e.g. Ruby, Go, TypeScript) tries to make cross-repo navigation work by default. However, for Python, the most common dependency file requirements.txt doesn't actually identify name and version of the current project, which makes this impossible. :upside_down_face:

What we can potentially do here is to provide a cookie-cutter Lua script which handles the common cases.

  1. It checks if pyproject.toml is present, and if so, does something. (This is better since it can enable cross-repo potentially)
  2. It checks if requirements.txt is present, and if so, does something. (No cross-repo)

and so on.

Having it on by default may potentially spawn a bunch of jobs, all of which just fail.

We can potentially make it off by default, with a way to easily turn it on.

varungandhi-src commented 1 year ago

The pyproject.toml etc. logic should be put inside the indexer itself.

Here is a script that seems to work OK, I tested it on a couple of projects. Need to test it on a few more projects first:

This script needs to be added in the site-admin settings.

Contents:

local path = require("path")
local pattern = require("sg.autoindex.patterns")
local recognizer = require("sg.autoindex.recognizer")

local custom_recognizer = recognizer.new_path_recognizer {
    patterns = {
        pattern.new_path_basename("requirements.txt"),
        pattern.new_path_basename("pyproject.toml"),
        pattern.new_path_basename("setup.py")
    },

    generate = function(_, paths)
        -- If a project uses several config files, only run indexing
        -- once instead of multiple times
        local roots = {}
        for i = 1, #paths do
            roots[path.dirname(paths[i])] = true
        end

        local jobs = {}
        for root in pairs(roots) do
            table.insert(jobs, {
                steps = {},
                local_steps = {"pip install . || true"},
                root = root,
                indexer = "sourcegraph/scip-python",
                indexer_args = {"scip-python", "index"},
                outfile = "index.scip",
            })
        end

        return jobs
    end,
}

return require("sg.autoindex.config").new({
    ["custom.python"] = custom_recognizer,
})
varungandhi-src commented 1 year ago

There's the question of how do want to roll this out to Sourcegraph.com and how to share this with customers.

Requirements

  1. It gets turned on for customers on Cloud and customers using executors by default, with an option to turn off the auto-indexing.
  2. (Soft) We're able to turn it on only for pyproject.toml on Sourcegraph.com

Ideal solution

We'd just have a single sg.python recognizer which would support all 4 files by default. On Sourcegraph.com, we would override the patterns list to only use PKG-INFO and pyproject.toml. We would turn off sg.python and set custom.python to the slightly tweaked recognizer.

Other options

Just turn off sg.python altogether, and create a copy of it with an edited list for Sourcegraph.com. This doesn't require any further changes to the backend to enable re-usability.