mackncheesiest / FakeNewsClassifier

Uses Labeled Latent Dirichlet Allocation (LLDA) to attempt to classify news from two merged datasets as "real" or "fake" based on either article title or article title + hostname
0 stars 0 forks source link

Look at measures other than testing accuracy #1

Open mackncheesiest opened 7 years ago

mackncheesiest commented 7 years ago

Matthew Correlation Coefficient, etc.

mackncheesiest commented 7 years ago

Test results for various setups image

mackncheesiest commented 7 years ago

The best performing classifier in terms of both Matthews Correlation Coefficient and Testing Accuracy is the prepended-hostnames-with-only-5-iterations test. It would appear that the 35 iterations llda began to overfit the training data and thus lost generalization performance. It might be good to explore what the optimal number of iterations are, but these training runs take a while.