newfs / gobotany-app

Deployable code for the Go Botany application
9 stars 8 forks source link

Get More Questions: omit questions with fewer than two choices #133

Closed jnga closed 12 years ago

jnga commented 12 years ago

Upon running Get More Questions, omit returning any new questions that have fewer than two non-zero-count choices. From Iteration 25 document:

EF says: [...] When I choose Betula as the genus, I get five species; none of the questions then are informative. When I choose more questions about leaves, algorithm adds “leaf teeth,” “leaf blade base symmetry,” and “leaf blade bloom.” However, only one of the new characters, leaf teeth, is informative (has multiple candidate species for states). Should the app be adding non-informative characters? Ideally not. If this cannot be solved in programming, perhaps have a message that says “no more characters of this type can be added that help you distinguish your plant. Try answering questions about a different feature.” This addition of uninformative characters has repeatedly happened in demos. [...]

John says: The particular problem mentioned above is that two of the three new questions added for the 5-plant Betula results had only one non-zero answer available, and this single answer would of course not narrow the results down below 5 plants. The other character did have two non-zero answers and therefore could be of help in further narrowing down. Found that if one keeps asking for more “leaves” questions many times, most questions have only one answer, but eventually some questions with multiple non-zero choices come up: “Hairs on underside of leaf blade”, “Leaf blade shape”, “Leaf blade base shape”, etc. Looks like we should modify the more-questions feature so that it does not return questions which have fewer than two non-zero-count choices for the current state of the page.

(Woody angiosperms Level 3 page)

jnga commented 12 years ago

It looks like we would need to enhance the Get More Questions feature to be aware of the current filtering state of the page.

jnga commented 12 years ago

Working on a solution for this.

jnga commented 12 years ago

With the fix above, questions with choices that are capable of narrowing down current results will come first, in order of ease of observability. Information gain is not currently used. I would like to see us please try testing this internally with botanists and other internal users to see if this simple way of providing more questions does what is desired. With the hope of keeping code as simple and easy to understand and maintain as possible, only if this is not quite good enough do I think we should try adding elements of information gain back in.

For the above mentioned example, now the first three new questions added for Leaves will be Leaf Blade Width, Leaf Blade Length, and Leaf Blade Shape. Pressing Get More Questions again should bring three more operable filters should appear. Upon repeating, when operable filters run out, then currently-inoperable ones are added.

jnga commented 12 years ago

Implemented Brandon's good suggestion that we can still pass species ids like we used to, rather than running the filtering query on the server side. This simplifies things further, reduces queries, and improves speed.