Generate database from saved build output

derekschrock commented 4 years ago

In the same vein as #284 would it be possible to add a feature to allow bear to read the saved output from a build to generate a database? This would save the user from having to perform a build if a build was done else where such as tools like mock on EL or poudriere on FreeBSD.

rizsotto commented 4 years ago

Hey @derekschrock , could you explain a bit more of this use case, because I am not fully get it. And I'm not familiar with these tools (nor using FreeBSD these days).

Bear has an --append flag, that you can use to extend an existing compilation database. Is this similar to that?

You can also look at the features page on the wiki to see the existing and planed features.

derekschrock commented 4 years ago

No, --append wouldn't be related here since we're dealing with creating a database and I don't see it in the features page.

From my understanding bear will set proper env. vars (preload, dymn., etc.) such that newly created processes from the build command will be intercepted and the command/args reports will be written to /tmp/intercepted-... as binary data then later reread and filtered to generate the compiler_commands.json file. Maybe the exact details are mildly incorrect here but I think that's the gist of it? "A build is required to generate the database."

But the idea is that in some cases either a build isn't ideal due to time and environment or maybe a build was done elsewhere and you have the save output of those builds in a text file. That's where tools like poudriere or mock come in to play. They have the saved build output.

The main functionality of these two tools is to build software in an isolated environment either by jails (FreeBSD/poudriere) or chroots (mock/EL-Fedora) via some source definition (FreeBSD ports or rpm spec files). The main output of these systems are binary packages and a byproduct is the captured stdout/err output of the build phase.

With the saved output (poudriere or koji/mock) if bear was fed a file would it be able to process and filter (on a per line basis?) to generate a compile_commands.json. I believe this would bypass the interception/preload and build part of bear. A file parser would still generate the reports to disk and then the rest would be normal: read reports generate database.

Unrelated to this request I believe maybe the next step is to see if bear could be integrated into these tools (poudriere/mock/etc.) such that nightly/weekly/etc. builds can produce compile databases for later use. I think clangd's vision (can't find the link) is that upstream projects should start to provide a compile_commands file however I don't know if this is going to be portable.

I've found compiledb that will read output build output to generate databases. So I'm hoping that maybe bear could do the same.

rizsotto commented 4 years ago

Thanks for the explanation @derekschrock . Now I got it... Yes, it's an interesting use case.

It can be easy if the input text is annotated somehow. (At least to say, this was a command that the build executed, and the rest is the stdout of this command.) Or it can be very hard to identify the commands from a free text... those outputs you've sent looks to be the hard one. :)

Integrate Bear to these tool sounds more viable. I think we can even run these tools under Bear or we can run Bear inside the build tool. (The last one require more work to integrate, but less work to filter the output.)

I am planing to put more effort to make the final output be more portable. That will require to parse the compiler flags, and that's why it's a bit more harder to do.

rizsotto commented 4 years ago

There is a release candidate on master branch now, which suppose to fix this issue. It has an executable citnames which takes an execution report and generates the compilation database. This does not require to run the build. The execution report is a JSON file which looks like this:

{
  "context": {
    "host_info": {
      "_CS_GNU_LIBC_VERSION": "glibc 2.30",
      "_CS_GNU_LIBPTHREAD_VERSION": "NPTL 2.30",
      "_CS_PATH": "/usr/bin",
      "machine": "x86_64",
      "release": "5.5.13-200.fc31.x86_64",
      "sysname": "Linux",
      "version": "#1 SMP Wed Mar 25 21:55:30 UTC 2020"
    },
    "intercept": "library preload"
  },
  "executions": [
    {
      "command": {
        "arguments": [
          "sleep",
          "1"
        ],
        "environment": {
          "PATH": "/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin"
        },
        "program": "/usr/bin/sleep",
        "working_dir": "/home/use/Code/Bear.git"
      },
      "run": {
        "events": [
          {
            "at": "2020-02-16T21:00:00.000Z",
            "type": "start"
          },
          {
            "at": "2020-02-16T21:00:00.000Z",
            "status": 0,
            "type": "stop"
          }
        ],
        "pid": 503092,
        "ppid": 503083
      }
    },
    {
      "command": {
        "arguments": [
          "sh",
          "-c",
          "sleep 1"
        ],
        "environment": {
          "PATH": "/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin"
        },
        "program": "/usr/bin/bash",
        "working_dir": "/home/user/Code/Bear.git"
      },
      "run": {
        "events": [
          {
            "at": "2020-02-16T21:00:00.000Z",
            "type": "start"
          },
          {
            "at": "2020-02-16T21:00:00.000Z",
            "status": 0,
            "type": "stop"
          }
        ],
        "pid": 503088,
        "ppid": 503078
      }
    }
  ]
}

The context part is not that relevant, but the execution list is. These are the relevant informations which needed to figure out what could have been the intent of the command execution. Bear has a tool (intercept) which can generate this file by running the build command. So, if you can generate it from a log of your CI build, that could work with this.

rizsotto / Bear

Generate database from saved build output #287