tarahmarie / dh-trace

MIT License
0 stars 0 forks source link

Svm #4

Closed jdmartin closed 1 year ago

jdmartin commented 1 year ago

(Unfortunately, draft PRs don't seem to be available on your account type. Please see the labels and the next sentence ;)

This is very much work-in-progress, hence the wait_on_merge label. It's fine to check this out and test it. It works in its own database.

Right now, the svm tool (do_svm.py) will evaluate the texts in projects/.../splits_for_svm. It will produce one table (predictions) that assesses the likelihood that each chapter is by any of the authors in the set. (Yes, an author compares to itself.)

Next, it will produce another table "chapter_assessments" which is largely redundant, containing the work, the chapter, and scores for each author. This will be more useful later.

Finally, the tool assesses the texts in projects/.../testset and scores them against the seen authors. The scores, as before, are stored in the table "test_set_preds."

Here's where things get a little different...

Our trusty explore-db.py struggles a bit with the number of columns in the set. So, it's time for explore/explore-svm.py.

Right now, this will let you explore the training set by author. This needs some enhancement, but it does work. More usefully, the tool will let you explore the training set by choosing an author and chapter, and then see the scores for that chapter vs. the trained set. It will also plot these for you to see.

And that's where it stands at the moment. Work in progress, which will get better.