Retire the code coverage frontend

We can retire coverage.moz.tools and rely on Searchfox instead. The only feature which we must port to Searchfox before we can retire coverage.moz.tools is the feature to show coverage for a specific platform or suite, which developers often ask for.

Would there be a place searchfox could download historical coverage data from to be able to display coverage data for old versions on-demand? I think right now the coverage job artifacts on taskcluster potentially have short lifetimes, but https://coverage.moz.tools/ seems to have a vast archive of the information.

Context: I am sketching out a simple on-demand/idempotent job system for searchfox for operations that may require some non-trivial processing/ingestion that doesn't meet current searchfox latency requirements and/or involves potentially non-trivial retrieval costs, but which could be cached on the server so that after initial processing the latency would be similar to any other dynamically answered local query.

Historical coverage data seems like a use case where this could make sense. In particular, if the goal is to minimize the infrastructure required like providing random access to the coverage data in terms of (revision, path, suite), searchfox could handle fetching per-revision archives from long-term S3/whatever. These would only be accessed once by any given searchfox server (they turn over 2x a day), although the web-servers could also make some effort to propagate recently-accessed immutable data to their successors.

Yes, we are storing the reports in a bucket on GCS. We could make the bucket publicly readable, or give you credentials to access it.

Oh, other two important features we would need to add to Searchfox in order to retire the coverage frontend would be: 1) coverage graph showing the evolution of coverage over time; 2) directory coverage.

1. coverage graph showing the evolution of coverage over time;

I expect there are probably use-cases and/or workflows related to this. Can you point me at where they might be (potentially newly) defined? In particular, for the token-centric hyperblame stuff I'm creating some first-class derived information where it could potentially be useful to have coverage deltas baked into the history repo too, but I want to be sure I understand the existing use-cases.

History/Coverage Fusion Thinking Out Loud

If we're processing the coverage data as we process the history, we can potentially correlate the data so we can potentially further categorize the coverage. Like we can know "this new code landed with coverage", "this new code landed without coverage", "this modified/existing code lost coverage", "this modified/existing code gained coverage", and "this modified/existing code retained coverage". We could then have numbers for the different categories potentially allowing for filtering predicates to highlight these changes as well as badges on commits so people can ambiently know if the changes had coverage or not.

The major caveat of course is that this only works easily if we have coverage data on a per-commit basis. Practically speaking, I don't know that we can get data on a per-commit basis both for cost reasons and because it's questionable whether there's any benefit to trying to get coverage data at a sub-push granularity. That said, machinery could exist to perform targeted coverage backfills into autoland to help provide more data when necessary.

Given how the current history and imminent token-centric history processing works, it seems like the most sane way to deal with not having per-commit coverage data would be to effectively run the history logic where we fold all commits between coverage datapoints into a single commit so that the logic has an easier time of it. Having what amounts to a separate history tree could make sense here if we would want to apply searchfox's general blame trick to be able to do "show me the build/stack of patches where we lost/gained coverage for this line".

In any event, I guess this sounds like:

something interesting where it would be great to have use-cases
where we'd really need https://github.com/mozilla/code-coverage/issues/1047 and the general coverage aggregation reliability to be fixed if we're going to consume its output for a first-class feature.
something that should come after my initial token-centric hyperblame landing and first few rounds of improvements anyways rather than something that is integrated while I'm finishing that up. Practically speaking, it will take several rounds of moved code detect enhancements for correlated coverage data to be useful anyways. We'll also have a better understanding of the disk usage and performance characteristics of the changes there which could inform any further consideration of more coverage-related features.

I just (re?)discovered the jobs that use "--per-test-coverage" which it seems provides detailed coverage information for changed tests as they're changed. This is amazing and I think would allow probably the most requested feature for searchfox as it relates to coverage ("where is that coverage coming from?"). I asked briefly about some of the things I mentioned in the previous comment and people seemed more interested in knowing what tests exercise the functionality they're looking at.

Are these files archived in the bucket and/or is there an index of them somehow? Is this code in code-coverage-bot that is uploading an even-more-thorough source-file-to-test-files-that-cause-coverage-in-the-source-files index? Or does the part where it's using an ActiveData URL that seems to not be a domain that exists anymore (and I'd heard ActiveData fell over) mean that these are not produced anymore? Thanks!

1. coverage graph showing the evolution of coverage over time;
I expect there are probably use-cases and/or workflows related to this. Can you point me at where they might be (potentially newly) defined? In particular, for the token-centric hyperblame stuff I'm creating some first-class derived information where it could potentially be useful to have coverage deltas baked into the history repo too, but I want to be sure I understand the existing use-cases.

It is mostly for high-level tracking, there is no set workflow. Periodically people are interested to know what the level of coverage is and whether it's moving in the right direction or decreasing.

If we're processing the coverage data as we process the history, we can potentially correlate the data so we can potentially further categorize the coverage. Like we can know "this new code landed with coverage", "this new code landed without coverage", "this modified/existing code lost coverage", "this modified/existing code gained coverage", and "this modified/existing code retained coverage". We could then have numbers for the different categories potentially allowing for filtering predicates to highlight these changes as well as badges on commits so people can ambiently know if the changes had coverage or not.

The major caveat of course is that this only works easily if we have coverage data on a per-commit basis. Practically speaking, I don't know that we can get data on a per-commit basis both for cost reasons and because it's questionable whether there's any benefit to trying to get coverage data at a sub-push granularity. That said, machinery could exist to perform targeted coverage backfills into autoland to help provide more data when necessary.

In many cases, we are able to map the coverage back from the push to a commit contained in it. https://github.com/mozilla/code-coverage/blob/e864c236686ae77ac4964eff55d2e3dfeff907f6/bot/code_coverage_bot/phabricator.py#L110.

I just (re?)discovered the jobs that use "--per-test-coverage" which it seems provides detailed coverage information for changed tests as they're changed. This is amazing and I think would allow probably the most requested feature for searchfox as it relates to coverage ("where is that coverage coming from?"). I asked briefly about some of the things I mentioned in the previous comment and people seemed more interested in knowing what tests exercise the functionality they're looking at.

Yeah, I think this is by far the most requested feature! And I agree it is the most useful one.

Are these files archived in the bucket and/or is there an index of them somehow? Is this code in code-coverage-bot that is uploading an even-more-thorough source-file-to-test-files-that-cause-coverage-in-the-source-files index? Or does the part where it's using an ActiveData URL that seems to not be a domain that exists anymore (and I'd heard ActiveData fell over) mean that these are not produced anymore? Thanks!

We briefly discussed about this on Matrix, I'll write something up here for posterity.

These jobs are still running, but there was still something left before they could be fully used: we need to have a cron job to refresh the data, probably weekly (https://bugzilla.mozilla.org/show_bug.cgi?id=1508237). There might also be bugs that need to be fixed once we run them more broadly. To reduce the data storage requirements, we could choose to only store "covered"/"not covered" (so 1 bit per line).

ActiveData is not really a hard requirement, the mapping script could be rewritten to just use TaskCluster API.

Thanks for the detailed response and the notes about per-test coverage for future reference!

In many cases, we are able to map the coverage back from the push to a commit contained in it.

https://github.com/mozilla/code-coverage/blob/e864c236686ae77ac4964eff55d2e3dfeff907f6/bot/code_coverage_bot/phabricator.py#L110

To clarify, my concern is more one of false precision. My ramblings there are about whether we should tie coverage to searchfox's per-commit history mechanism. I'm pretty confident at this point that we should not for a variety of reasons[1].

The key concern for me is avoiding creating a situation where searchfox tells a user "revision A landed with coverage on this line" and so the user is looking at revision A to figure out what's going on, but if you check out revision A you will find it doesn't have coverage. Instead the coverage was added by revision B from later in the same push which changed control flow, or even revision C from a different, later push that is part of the group of pushes newly getting coverage data because we can't run full coverage on every push.

That said, being able to look at specific revisions and know via reprojection/interpolation via blame/annotate like you say I think is potentially quite useful, I just want to make sure searchfox is very explicit about ambiguous situations so the user can build the right mental model. If we're clear that the coverage in the current revision is interpolated and make it clear of the state of coverage in the previous and next coverage runs, and especially highlight when there's a difference, I think that could help users have the right mental model.

An important new UI affordance for searchfox could be a sort of timeline at the top of the page where our currently ignored Showing REVISION: That revision's commit message text under the search box. The timeline would clarify where the commit the user is looking at is in terms of any containing patch stack pushes as well as the other adjacent pushes that would impact the coverage data the user is looking at, or potentially the same nightly build the user looking at. The patch's containing stack would have visual priority, with the other timeline aspects potentially reduced unless the user interacts with the timeline display. The timeline could also have an upsell to let the user ask for backfills somehow by generating a treeherder link or something like that.

Also, I think we'd do something different for the HEAD revision where the user almost certainly doesn't care about the specific revision, but probably would care about knowing how stale the searchfox data is / what nightly the data corresponds to. This could help provide a more explicit UI difference between "you're looking at the latest data with semantic data" versus "you're looking at the past and there's no semantic data", although I of course also have plans for how we could have historical semantic data for at least the past few weeks.

1: My specific reasons for thinking any coverage history tracking should not be part of searchfox's derived hyperblame git repositories are (noting that it was only me even suggesting this as a possibility, but I've now answered my own question):

Git history is immutable for our purposes, whereas we can gain new coverage data at anytime via backfills. This means coverage data integrated into a git-tracked data-model potentially has to regenerate potentially huge swathes of the derived git history any time we get new coverage data.
Searchfox's derived blame repository is inherently "clever" in a "too clever for its own good" way in that it's explicitly driven by processing git trees and noticing changed hashes. For cases like coverage where we will almost never have coverage data from every single commit, this creates additional complexity if we try to interpolate state from coverage associated with "later" commits, especially if that interpolation requires us to have already completed processing of the token-centric blame which we're in the process of computing.
In my very limited "asking around", people didn't seem all that interested in the ability to ask "when did this line start getting coverage", especially in terms of being able to answer the question in O(1) time. Especially if we chuck the coverage data in a SQLite database, we could still answer that question, it might just take a few seconds.

mozilla / code-coverage

Retire the code coverage frontend #1048

History/Coverage Fusion Thinking Out Loud