penguin-statistics / backend-next

The refactored Penguin Statistics v3 Backend. Built with Go, fiber, bun and go.uber.org/fx. Uses NATS as MQ and Redis as state synchronization.
MIT License
176 stars 19 forks source link

The Adoption of T-test Instead of Z-test Might Improve the Accuracy #445

Open zayn7lie opened 1 year ago

zayn7lie commented 1 year ago

We cannot know the real σ (Standard Deviation), because the standard deviation we get is from the data we get from users. Then we have to consider df (Degree of Freedom) to the test for σ-hat. T-test could make the result more accurate because it make df into consideration while z-test only exist when df tends to infinity or σ is ideally known. Thus, the adoption of T-test is better then Z-test.

GalvinGao commented 1 year ago

In fact, the Z-test functionality was omitted when we were implementing the v3 backend last year, due to the exceptional and tremendous amount of load it puts onto our infrastructure at that time. We were planning to reimplement and further, integrate it with our monitoring and alerting system in order to give us the right signal to dive into deeper investigations on a potential deviation of the dataset.

Keeping the issue for tracking. Thanks a lot for the precious feedback!