Bug: The Expedition yields very poor quality themes

raindropsfromsky commented 4 years ago

I ran the Expedition on a few files, and the result does not any sense at all.

For example, the 721-page Compendium file yields the following themes:

help
manual Others

And the Supreme Court case file yields the following themes:

EIA; Notification; EC; MoEF; OM
EIA; Notification; EC; help; OM Others

(I have shared both files.)

Finally, I ran expedition on the Qiqqa manual itself (just to see what happens). And this is the amazing result:

I see the following issues:

They don't appear to be based on a frequency analysis
In the second case, 80% of the terms in the themes overlap!
In both results, the Pie chart shows "Others" as zero (no slice at all). If so, why even list it?
Qiqqa has managed to find matching themes between two documents on same subject, and one totally dissimilar document that has nothing common with the first two.

GerHobbelt commented 4 years ago

🤔 I think I know where that is coming from. That's probably a mistake I made while I was debugging Qiqqa a while ago: I seem to recall hitting a serious performance drain with the "AI" in Qiqqa running on the lib under test and right now I'm not sure if I reverted the cutting back on that internal calc nightmare.

Anyway, needs investigation, as Qiqqa can do better in the tag suggestion department there. It's not brilliant, but auto-tags were a decent set for a machine to spit out, at least for my libs and (low) expectations then.

GerHobbelt commented 4 years ago

(There were odd bugs in the existing implementation though, already before I kicked it in the nadgers for performance reasons. I recall having had a look at how I could kill the duplicate "themes" discovered by Qiqqa under some circumstances, but at least I am sure I did not resolve that problem as I ran into some coding chunks which made this hard to fix. 😢

Needs follow up & analysis of the current state of affairs in v82pre9 -- I am assuming you're using the latest experimental Qiqqa release, right?

raindropsfromsky commented 4 years ago

Yes, using the v82pre8

jimmejardine / qiqqa-open-source

Bug: The Expedition yields very poor quality themes #185