In the discussion at https://github.com/umputun/tg-spam/discussions/98, we encountered a scenario where there were no samples available, yet dynamic spam was being learned. This occurred because a binary version was executed without any spam or ham samples. Consequently, the classifier only had spam samples (without any ham samples) and tended to label many messages as spam. While this behavior is technically logical, it is surprising and unexpected for users.
This pull request addresses the issue by disabling the classifier check when either the spam samples or ham samples are empty. Although this fix is not perfect and may lead to some confusion when users add spam samples without observing any immediate changes, it is considered preferable as it is less disruptive.
Furthermore, the documentation will be updated to explain how the system operates in the absence of ham or spam samples.
In the discussion at https://github.com/umputun/tg-spam/discussions/98, we encountered a scenario where there were no samples available, yet dynamic spam was being learned. This occurred because a binary version was executed without any spam or ham samples. Consequently, the classifier only had spam samples (without any ham samples) and tended to label many messages as spam. While this behavior is technically logical, it is surprising and unexpected for users.
This pull request addresses the issue by disabling the classifier check when either the spam samples or ham samples are empty. Although this fix is not perfect and may lead to some confusion when users add spam samples without observing any immediate changes, it is considered preferable as it is less disruptive.
Furthermore, the documentation will be updated to explain how the system operates in the absence of ham or spam samples.