gwu-libraries / obento

Bento Box style search results page
MIT License
8 stars 1 forks source link

Limit query input size #221

Open kerchner opened 10 years ago

kerchner commented 10 years ago

We have seen occasional queries where users pasted large amounts of text into the query box, resulting in Solr errors of the following type:

DatabaseError: index row size 3560 exceeds maximum 2712 for index "ui_search_q"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

Queries that are that large would not lead to meaningful results. The task here is to limit the size of the input field on the web page, to prevent the user from entering queries more than "n" number of characters. I would recommend we start with n=100.

cummingsm commented 10 years ago

Or you could just select the first characters of whatever the user enters in the field.

On Mar 4, 2014, at 11:19 AM, Dan Kerchner notifications@github.com wrote:

We have seen occasional queries where users pasted large amounts of text into the query box, resulting in Solr errors of the following type:

DatabaseError: index row size 3560 exceeds maximum 2712 for index "ui_search_q" HINT: Values larger than 1/3 of a buffer page cannot be indexed. Consider a function index of an MD5 hash of the value, or use full text indexing. Queries that are that large would not lead to meaningful results. The task here is to limit the size of the input field on the web page, to prevent the user from entering queries more than "n" number of characters. I would recommend we start with n=100.

— Reply to this email directly or view it on GitHub.

kerchner commented 10 years ago

The reason I prefer to limit input is that if you allow larger input and truncate, the user may not be aware that the query was truncated and therefore didn't match what was entered.

dchud commented 10 years ago

You're both right - this is two problems which both need attention.

On Mar 4, 2014, at 11:29 AM, Michael Cummings notifications@github.com wrote:

Or you could just select the first characters of whatever the user enters in the field.

On Mar 4, 2014, at 11:19 AM, Dan Kerchner notifications@github.com wrote:

We have seen occasional queries where users pasted large amounts of text into the query box, resulting in Solr errors of the following type:

DatabaseError: index row size 3560 exceeds maximum 2712 for index "ui_search_q" HINT: Values larger than 1/3 of a buffer page cannot be indexed. Consider a function index of an MD5 hash of the value, or use full text indexing. Queries that are that large would not lead to meaningful results. The task here is to limit the size of the input field on the web page, to prevent the user from entering queries more than "n" number of characters. I would recommend we start with n=100.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/gwu-libraries/obento/issues/221#issuecomment-36643195 .

kerchner commented 10 years ago

@StudioZut reassigning to you to implement a maxlength (of 100, for now) for the All tab in the tabbed search bar (both on the front page and on the other pages). When complete, please add a comment here and close the issue.

lwrubel commented 10 years ago

I don't think a user will be able to tell that the query statement was truncated the way the boxes are displayed now. They're shorter than 100 characters and the search statement is not displayed outside the box.

We do need to let people know we're truncating because if you chop off the query in the middle of a word, they'll get 0 search results. For example, see:

Understanding young adult physical activity, alcohol and tobacco use in community colleges and 4-year post-secondary institutions: A cross-sectional analysis of epidemiological surveillance data vs. Understanding young adult physical activity, alcohol and tobacco use in community colleges and 4-yea

Ideally, we would truncate at the last whitespace character before 100 characters, and maybe also display the search statement on the results page wholly (although there are clutter issues with that).

StudioZut commented 10 years ago

I'm working on a trimmed solution that only breaks on spaces (preserving whole words).

StudioZut commented 10 years ago

Deployed fix to gwlibrary-test.wlrc.org

StudioZut commented 10 years ago

My fix would only apply to the tabbed search tool, it would need to be implemented on the "free standing" search box of the results page.

function bentoTrim(f,opt) {

    // define target of search
    bentoPath="/search-all?query=";

    // get text field from form
    myTXT=f.query.value;

    // trim text field
    myTrim = myTXT.replace(/^(.{100}[^\s]*).*/, "$1") + "\n";

    // set path + trimmed text
    newQuery=bentoPath+myTrim;

    // send modified query to bento
    window.open(newQuery);

}
kerchner commented 10 years ago

@StudioZut let's discuss in person. My intention above (see my last comment) was to just implement just a simple maxlength = 100 on the tabbed search bar, like the one in the commit/changeset above. We should discuss whether it's better to implement the truncation on the front end (which would be in two places), versus on the back end (in one place), and any other implications. I wonder if with javascript we could display something if it's too long that tells the user the query will be truncated? We should think this through.

StudioZut commented 10 years ago

I agree. Truncating in a single location is better. Laura's point about chopping words is an important consideration, so I avoided the simpler maxlength solution.

kerchner commented 10 years ago

@StudioZut and @kerchner discussed; when pasting in text, browser puts the user's cursor at the end of the text, so it's unlikely the user won't be aware there's a limit. Therefore, we're going to implement as a simple maxlength=100. Also going to attempt to display text (small/red?) when input length = 100 to the effect of "Maximum query length of 100 reached" (?). Message should disappear when length < 100. Consider implementing message just on libsite home page and bento results page for now - smaller search widget at the top of other pages might not have room for the message.

StudioZut commented 10 years ago

gwlibrary-test.wrlc.org now has "maxlength=100" for both tabbed search boxes, and I removed the bentoTrim function.

StudioZut commented 10 years ago

Draft of a character counting js function is in place: countChar() Reminder to check mobile view search boxes

kerchner commented 10 years ago

This issue warrants additional analysis. Truncating at 100 chars might disallow pasting of some citations, which users should be able to do.

Pushing this to the next milestone.

StudioZut commented 10 years ago

gwlibrary-test.wrlc.org: A message now appears below the tabbed search box when you hit 99 chars: "Limited to 100 characters (we'll use your first hundred)" and the input field is maxlength=100

(note that this none of this has been applied to the tabbed search tool in the header, just the home page)

lwrubel commented 10 years ago

Have you considered putting a message on the search results page as well?

I suspect that the really long searches are entered by pasting in text. When I tried this, I pasted and hit enter, which is so fast that I never saw the message back on the search box page.

StudioZut commented 9 years ago

Where are we with this? I was under the impression we had dropped this idea. Is this something I should continue to develop?

kerchner commented 9 years ago

@StudioZut good question. I'm thinking that reviewing the /searches data - as well as the data that #220 will provide - is a prerequisite to determining what we want to do here. We'll need to decide questions such as: do we want to support pasted-in citations? The timeline for re-evaluating this question will then need to be deployment of #220 plus some time to gather data, so... around mid-spring.