impactcentre / ocrevalUAtion

OCR evaluation brought to you by University of Alicante
Apache License 2.0
66 stars 27 forks source link

BoW metric: wrong definition? #30

Open bertsky opened 1 year ago

bertsky commented 1 year ago

In the implementation of the bag of word error rate, you pick the maximum over positive deltas (i.e. what you could call sum of false negative frequencies) vs. negative deltas (i.e. sum of false positive frequencies):

https://github.com/impactcentre/ocrevalUAtion/blob/84f15b894eccd05365a9116f03b5b8b97c1b74b6/src/main/java/eu/digitisation/document/TermFrequencyVector.java#L76

What's the logic behind this, what definition is this based upon?

I would expect BoW in terms of