hound-search / hound

Lightning fast code searching made easy
MIT License
5.68k stars 578 forks source link

`archive` pseudo-vcs driver: indexing code in archives (e.g. zip, tar) without extracting files #484

Open muravjov opened 5 months ago

muravjov commented 5 months ago

What kind of change does this PR introduce? (check at least one)

The PR fulfills these requirements:

If adding a new feature, the PR's description includes:

Description:

This PR adds a new driver archive, which allows to index source code in archives (e.g. zip, tar; any that supported by https://github.com/mholt/archiver) without extracting files: while indexing, files are walked using archive API, and while searching, results are checked and snippets generated with files extracted on the fly.

A config example:

{
  "dbpath" : "db",
  "vcs-config" : {
    "git": {
      "ref" : "main"
    }
  },
  "repos" : {
    "video" : {
      "url" : "/Volumes/1tb-ext4/twitch/video.zip",
      "vcs" : "archive",
      "vcs-config" : {
        "ignored-files" : [".git"]
      },
      "url-pattern" : {
        "base-url" : "file:///Volumes/1tb-ext4/src/twitch/{path}"
      }
    }
  }
}

Some metrics:

muravjov commented 5 months ago

@salemhilal would you mind to review the PR