unfortunately I do not have time to implement this myself due to a new job. However I wanted to get this idea out there.
Context
As mentioned during the state of the union meeting at the moment, statistics on the usage of home assistant components are only available to core developers due to privacy concerns. At the same time this data is extremely valuable and can help make informed decisions.
Proposal
Create a data publishing process using Statistical Disclosure Control for Microdata methods to anonymize data. This is the approach taken by national statistical agencies. Implementations in Python and R are available as well as GUI tools.
The anonymized data is less sensitive and can be periodically be used to provide a subset of this data on feature, platform and device usage.
Consequences
Data is made available for design decisions. However, this could also be provided through other means (without the design process for such a pipeline).
Positive:
Component usage information can be published to be used for informed design decisions.
Negative:
Design flaws may still publish personal information by accident
Proprietary data (on the level of usage of home assistant) becomes public
There is selection bias in the users that enable telemetry sharing.
unfortunately I do not have time to implement this myself due to a new job. However I wanted to get this idea out there.
Context
As mentioned during the state of the union meeting at the moment, statistics on the usage of home assistant components are only available to core developers due to privacy concerns. At the same time this data is extremely valuable and can help make informed decisions.
Proposal
Create a data publishing process using Statistical Disclosure Control for Microdata methods to anonymize data. This is the approach taken by national statistical agencies. Implementations in Python and R are available as well as GUI tools.
The anonymized data is less sensitive and can be periodically be used to provide a subset of this data on feature, platform and device usage.
Consequences
Data is made available for design decisions. However, this could also be provided through other means (without the design process for such a pipeline).
Positive:
Negative: