telemetry service that tabulates top warnings, complaints

steveklabnik commented 9 years ago

Issue by graydon Wednesday Jul 31, 2013 at 20:05 GMT

For earlier discussion, see https://github.com/rust-lang/rust/issues/8161

This issue was labelled with: A-diagnostics, A-infrastructure, A-instrumentation, I-enhancement, I-wishlist in the Rust repository

This is a bit of a weird / wishlist item, but it would be nice to be able to gather statistics on how often our users hit which errors, and how much they like them vs. loathe them.

This would be a relative of #2092 in which we have a code reserved for each warning/error/lint message (which we never reuse). Rustc would be configured to keep local tabulation in your homedir while you work of a few basic statistics (codebase size, number of unique files processed, number of runs of the compiler, resource usage) and the number of occurrences of a given message, along with a pair of commands (say: rustc yay and rustc boo) that mark your approval or disapproval of the most recent error message. Periodically, or on demand, you could submit your tabulated set to a telemetry service. This way we could get directed feedback of the sorts of problems people are having while using the compiler as well as whether they appreciate or feel annoyed by various forms of error message.

(Other suggestions welcome; was thinking win vs. fail, or perhaps some variant of derp, wat or fuuuuuu but couldn't think of suitable paired positive-affirmation phrases, and in any case they seem a little unkind.)

jxcl commented 9 years ago

Rustc would be configured to keep local tabulation in your homedir while you work of a few basic statistics (codebase size, number of unique files processed, number of runs of the compiler, resource usage)

To clarify: every separate invocation of rustc would produce a separate statistic on codebase size, number of unique files processed and resource usage, that would then be appended to a file in the homedir? That file is going to get pretty big pretty fast. It may be best to split these into several files.

I suggest having a .rustc_stats directory and within it having error_stats.json and invocation_stats.

error_stats.json would look something like this:

[
    {
        "error": "E###",
        "occurences": 20,
        "yay_count": 3,
        "boo_count": 0
    },
    ...
]

I'm not sure what the format of invocation_stats should be. Writing it in JSON would make it harder to append to, which is mostly what you would want to do. Perhaps plaintext CSV?

I can think of at least two advantages to having these stats in separate files.

As I've mentioned already, append efficiency for the invocation stats
I can imagine a scenario where someone would be completely happy to submit his error statistics to the telemetry service, but may be more wary of submitting code size, unique files, etc. Splitting these statistics makes it easier to verify that nothing shady is going on. You could even disable collection of either file separately.

jxcl commented 9 years ago

@brson was involved in the original discussion

rust-lang / rfcs

telemetry service that tabulates top warnings, complaints #629