umputun / tg-spam

Anti-Spam bot for Telegram and anti-spam library
https://tg-spam.umputun.dev
MIT License
217 stars 39 forks source link

ignore classifier check if either spam or ham samples not loaded #99

Closed umputun closed 4 months ago

umputun commented 4 months ago

In the discussion at https://github.com/umputun/tg-spam/discussions/98, we encountered a scenario where there were no samples available, yet dynamic spam was being learned. This occurred because a binary version was executed without any spam or ham samples. Consequently, the classifier only had spam samples (without any ham samples) and tended to label many messages as spam. While this behavior is technically logical, it is surprising and unexpected for users.

This pull request addresses the issue by disabling the classifier check when either the spam samples or ham samples are empty. Although this fix is not perfect and may lead to some confusion when users add spam samples without observing any immediate changes, it is considered preferable as it is less disruptive.

Furthermore, the documentation will be updated to explain how the system operates in the absence of ham or spam samples.