ualbertalib / can-link

Front end react app for CanLink project
1 stars 0 forks source link

Filter behaviour on search results page #40

Closed sfarnel closed 3 years ago

sfarnel commented 4 years ago

Related to issue #2

image

When searching from the landing page, the query opens correctly in the Search page, but the 'query' input isn't populated with the text of the query from the landing page input. This is now happening; see below.

image

However, from here, any changes a user makes in the top filters should act on the existing subset of items rather than on the collection as a whole.

For example, in the image below, the filter of date from 2013 to 2014 should limit the subset of results returned from the teeth search. Currently, it starts a new search.

image

jchartrand commented 4 years ago

@sfarnel @danydvd @CarlsoFiorention

This is a much trickier question. Every time we query SOLR it is a brand new query - I don't think we can easily ask SOLR to start with a given set of results and refine from there, but Danoosh might have ideas there.

What is also tricky is how SOLR handles queries that use AND/OR logic. It generally seems to just return what it thinks are the most relevant results. I spent a bit of time on this when setting up the form, trying out AND vs OR for fields, but in the end left it until people could take a look and decide then how it all should work.

This would likely be something that I'd have to work with Danoosh pretty closely on. Or Danoosh could give me examples of boolean queries that he knows works as expected in SOLR and I could just copy that format into the query I send from the browser.

In the end, though, I think this will be tricky. I think people will have different expectations about what the searching should do.

We might even need a call to talk about this.

jchartrand commented 4 years ago

@sfarnel Need clarification, I'm not sure what is meant by:

"When searching from the landing page, the query opens correctly in the Search page, but the 'query' input isn't populated with the text of the query from the landing page input. This is now happening; see below."

Or is this from the original issue?

jchartrand commented 4 years ago

Sorry, just to try to clarify the solr query problem (where it returns seemingly a different set of results on second query) a bit more:

I do think the 'query' term is being incorporated into the second query (with the years). It is just that because it is a brand new query, SOLR returns a different set of results to which it has applied it's 'most relevant' algorithm.

Will confirm though (that the query term is making it into the second query).

sfarnel commented 4 years ago

Yes, sorry; this is from the original issue. I just worded it badly :)

If this is a tricky one then let's set aside for the moment until we can investigate further.

On Fri, Aug 28, 2020 at 1:52 PM James Chartrand notifications@github.com wrote:

@sfarnel https://github.com/sfarnel Need clarification, I'm not sure what is meant by:

"When searching from the landing page, the query opens correctly in the Search page, but the 'query' input isn't populated with the text of the query from the landing page input. This is now happening; see below."

Or is this from the original issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jchartrand/can-link/issues/40#issuecomment-683117196, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABD26TGH4QPX4FQ4ULTSHMDSDADIVANCNFSM4QOOMWJQ .

-- Sharon Farnel she/her Head, Metadata Strategies University of Alberta Library sharon.farnel@ualberta.ca | 780-492-3685

The University of Alberta is situated on traditional Treaty 6 territory and homeland of the Métis peoples. Amiskwaciwâskahikan / ᐊᒥᐢᑲᐧᒋᕀᐋᐧᐢᑲᐦᐃᑲᐣ / Edmonton

jchartrand commented 4 years ago

Ok, will set aside.

On Aug 28, 2020, at 3:55 PM, Sharon Farnel notifications@github.com wrote:

Yes, sorry; this is from the original issue. I just worded it badly :)

If this is a tricky one then let's set aside for the moment until we can investigate further.

On Fri, Aug 28, 2020 at 1:52 PM James Chartrand notifications@github.com wrote:

@sfarnel https://github.com/sfarnel Need clarification, I'm not sure what is meant by:

"When searching from the landing page, the query opens correctly in the Search page, but the 'query' input isn't populated with the text of the query from the landing page input. This is now happening; see below."

Or is this from the original issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jchartrand/can-link/issues/40#issuecomment-683117196, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABD26TGH4QPX4FQ4ULTSHMDSDADIVANCNFSM4QOOMWJQ .

-- Sharon Farnel she/her Head, Metadata Strategies University of Alberta Library sharon.farnel@ualberta.ca | 780-492-3685

The University of Alberta is situated on traditional Treaty 6 territory and homeland of the Métis peoples. Amiskwaciwâskahikan / ᐊᒥᐢᑲᐧᒋᕀᐋᐧᐢᑲᐦᐃᑲᐣ / Edmonton — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/jchartrand/can-link/issues/40#issuecomment-683119191, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEFSXKPEZAMWEJYACC7SH3SDADTDANCNFSM4QOOMWJQ.

CarlsoFiorention commented 4 years ago

I agree, the user will expect that any further selection (e.g. years timeframe) would preferably modify the existing results obtained from the initial query, not starting from scratch... If this limitation from SOLR persists, should we consider removing the search option from the landing page and just offer access to widgets showing the whole collection?

sfarnel commented 4 years ago

Danoosh and I can investigate further to see if we can fix this behaviour.

On Fri, Aug 28, 2020 at 1:56 PM Carlos Fiorentino notifications@github.com wrote:

I agree, the user will expect that any further selection (e.g. years timeframe) would preferably modify the existing results obtained from the initial query, not starting from scratch... If this limitation from SOLR persists, should we consider removing the search option from the landing page and just offer access to widgets showing the whole collection?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jchartrand/can-link/issues/40#issuecomment-683120301, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABD26TATMKL5ZHZIJXCTWS3SDADX5ANCNFSM4QOOMWJQ .

-- Sharon Farnel she/her Head, Metadata Strategies University of Alberta Library sharon.farnel@ualberta.ca | 780-492-3685

The University of Alberta is situated on traditional Treaty 6 territory and homeland of the Métis peoples. Amiskwaciwâskahikan / ᐊᒥᐢᑲᐧᒋᕀᐋᐧᐢᑲᐦᐃᑲᐣ / Edmonton

sfarnel commented 4 years ago

Behaviour 1: second query does not seem to work

Search for Infants from landing page gets me to this search results screen:

image

The author of the first result listed is Tzelnic Tania, but if I type that into the author field and either hit enter or click on Search, it seems as though nothing changes:

image

The same behaviour appears to be the case for subject

sfarnel commented 4 years ago

Behaviour 2: second query seems to start from the entire dataset rather than the subset already returned from the initial query

Search for Russian from landing page gets me to this search results screen:

image

One of the institutions listed a few pages down is the University of Newfoundland, but if I select this from the Institution dropdown and either hit enter or click on Search, it seems as though the search that is run is for Russian but on the entire dataset:

image

The same behaviour appears to be the case for the year limiters.

AND what's weirder is that this second search brings back way more than the initial search. For example, in this case, when I search Russian from the landing page I get 8 pages of results, but when this second search runs, it gets 503 pages of results. Weird!

Can we see why this is? This seems to me to be a must fix.

jchartrand commented 4 years ago

I agree that this needs to be fixed and a search 'policy' determined/defined/set and also made clear to the user.

I think these all have to do with how SOLR decides what is 'most likely relevant'.

A google search for 'solr relevancy' returns all sorts of docs/tutorials about it.

Among them, this SOLR FAQ seems like it might be a good starting point:

https://cwiki.apache.org/confluence/display/SOLR/SolrRelevancyFAQ

One way to explore what SOLR does is using the admin page where you can construct queries and see the results:

http://206.167.181.124:8983/solr/#/test/query

In particular, as described in that SOLR Relevancy FAQ, you can add 'score' to the 'fl' field to see the relevancy scores for results.

That admin page is for the solr core that the web site is currently using, but Danoosh has also got three other cores in there now, but I'm not sure which one (if any) I should switch the site to use? @danydvd ?

This will likely take a bit of work to figure out exactly what SOLR is doing, and also to decide on a policy. One of the most general choices is between ORs vs ANDs when combining search fields, e.g.,

creator_last:jones AND institutions:alberta

vs.

creator_last:jones OR institutions:alberta

Different users might have different expectations about what the interface does by default, so we'll have to be clear about what the interface is doing. I think most people probably expect some compromise between the two (AND vs OR) along the lines of what Google does (which is I think what SOLR is doing by default). It might basically try to return its best guess as to what the user really wanted.

We could also introduce AND and OR into the interface (for each search field), but my experience with that on past projects is that people still don't really understand what these booleans do, and most of the time just don't use them anyhow.

I'm not sure who is best to look into all of this. I am happy to (I think it is very interesting), but recognize it could be time consuming.

sfarnel commented 4 years ago

Thanks @jchartrand this is very helpful. I agree that we need to understand what is happening here and be able to make that clear to the user. I will work with @danydvd to dig a little further so that your time can be used for other development activities.

danydvd commented 3 years ago

@jchartrand I am adding more cores with new index as I process more data (I try to keep the old ones as back-up) but generally I will use CanLink-new-*. CanLink-new-2 is currently the most recent version.

With regards to the SOLR search results, I think it would make more sense to use "AND" for narrowing down the query as we want the results to have both.

For Sharon's behavior 2: I believe the initial query sent to SOLR would be something like this (excluding the faceting parameters):

title:Russian OR abstract:Russian

and to narrow it down for MUN:

(title:Russian OR abstract:Russian) AND institution:"http://canlink.library.ualberta.ca/institution/Memorial_University_of_Newfoundland"

So basically keeping the initial query in () and use "AND" to append more parameters (e.g. (title:Russian OR abstract:russian) AND institution:"http://canlink.library.ualberta.ca/institution/Memorial_University_of_Newfoundland" AND year:* AND creator:* AND subject:*).

sfarnel commented 3 years ago

Thanks Danoosh. @jchartrand if you can use the most recent core that would be great as it includes additional metadata properties that we will want to display on the item record (but not yet visualize)

jchartrand commented 3 years ago

@danydvd @sfarnel

The solr queries in the can-link app are actually setup just as you describe, with the title and abstract OR’d, and the rest of the fields ANDed.

But, as Sharon noticed, that doesn’t return the results one would expect, which I think has something to do with the relevancy scoring that SOLR applies.

I think we need to figure out what is going on with the relevancy scores.

danydvd commented 3 years ago

@jchartrand querying SOLR admin with this (title:Russian OR abstract:russian) AND institution:"http://canlink.library.ualberta.ca/institution/Memorial_University_of_Newfoundland" AND year:* AND creator:* AND subject:* returns only one result:

{ "responseHeader":{ "status":0, "QTime":18, "params":{ "q":"(title:Russian OR abstract:russian) AND institution:\"http://canlink.library.ualberta.ca/institution/Memorial_University_of_Newfoundland\" AND year:* AND creator:* AND subject:*", "_":"1598827324604"}}, "response":{"numFound":1,"start":0,"docs":[ { "id":"http://canlink.library.ualberta.ca/thesis/2832bc6a34b89243f0ad7a91bb5c89a4", "year":[2010], "title":["\"To our hopeless affair\" : a visual anthropology study about women of the Russian Intelligentsia in the post-Soviet era"], "institution":["http://canlink.library.ualberta.ca/institution/Memorial_University_of_Newfoundland"], "creator_url":["http://canlink.library.ualberta.ca/person/c5142b4dc702fe6f64a76e71b1f409a9"], "creator":["Gan Gregory"], "lang":["http://id.loc.gov/vocabulary/languages/eng"], "creator_first":["Gregory"], "creator_last":["Gan"], "degree":["MA"], "subject":["cold war", "women intellectuals"], "_version_":1669588060199714819}] }}

@sfarnel is this the behavior that you were looking for?

jchartrand commented 3 years ago

Interesting - let me try to replicate your syntax exactly in the query issued by the app. I’ll let you know when it’s up.

On Sep 1, 2020, at 1:39 PM, Danoosh Davoodi notifications@github.com wrote:

@jchartrand https://github.com/jchartrand querying SOLR admin with this (title:Russian OR abstract:russian) AND institution:"http://canlink.library.ualberta.ca/institution/Memorial_University_of_Newfoundland" AND year: AND creator: AND subject:* returns only one result:

{ "responseHeader":{ "status":0, "QTime":18, "params":{ "q":"(title:Russian OR abstract:russian) AND institution:\"http://canlink.library.ualberta.ca/institution/Memorial_University_of_Newfoundland\" AND year: AND creator: AND subject:*", "_":"1598827324604"}}, "response":{"numFound":1,"start":0,"docs":[ { "id":"http://canlink.library.ualberta.ca/thesis/2832bc6a34b89243f0ad7a91bb5c89a4", "year":[2010], "title":["\"To our hopeless affair\" : a visual anthropology study about women of the Russian Intelligentsia in the post-Soviet era"], "institution":["http://canlink.library.ualberta.ca/institution/Memorial_University_of_Newfoundland"], "creator_url":["http://canlink.library.ualberta.ca/person/c5142b4dc702fe6f64a76e71b1f409a9"], "creator":["Gan Gregory"], "lang":["http://id.loc.gov/vocabulary/languages/eng"], "creator_first":["Gregory"], "creator_last":["Gan"], "degree":["MA"], "subject":["cold war", "women intellectuals"], "version":1669588060199714819}] }}

@sfarnel https://github.com/sfarnel is this the behavior that you were looking for?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jchartrand/can-link/issues/40#issuecomment-685022782, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEFSXJU4F6YA22WJKXT7JDSDUWTLANCNFSM4QOOMWJQ.

sfarnel commented 3 years ago

Thanks both. The Solr query does look like it returns what you would expect. Hopefully a small tweak in the app

On Tue, Sep 1, 2020 at 11:41 AM James Chartrand notifications@github.com wrote:

Interesting - let me try to replicate your syntax exactly in the query issued by the app. I’ll let you know when it’s up.

On Sep 1, 2020, at 1:39 PM, Danoosh Davoodi notifications@github.com wrote:

@jchartrand https://github.com/jchartrand querying SOLR admin with this (title:Russian OR abstract:russian) AND institution:" http://canlink.library.ualberta.ca/institution/Memorial_University_of_Newfoundland" AND year: AND creator: AND subject:* returns only one result:

{ "responseHeader":{ "status":0, "QTime":18, "params":{ "q":"(title:Russian OR abstract:russian) AND institution:\" http://canlink.library.ualberta.ca/institution/Memorial_University_of_Newfoundland\" AND year: AND creator: AND subject:*", "_":"1598827324604"}}, "response":{"numFound":1,"start":0,"docs":[ { "id":" http://canlink.library.ualberta.ca/thesis/2832bc6a34b89243f0ad7a91bb5c89a4", "year":[2010], "title":["\"To our hopeless affair\" : a visual anthropology study about women of the Russian Intelligentsia in the post-Soviet era"], "institution":[" http://canlink.library.ualberta.ca/institution/Memorial_University_of_Newfoundland"], "creator_url":[" http://canlink.library.ualberta.ca/person/c5142b4dc702fe6f64a76e71b1f409a9"], "creator":["Gan Gregory"], "lang":[" http://id.loc.gov/vocabulary/languages/eng"], "creator_first":["Gregory"], "creator_last":["Gan"], "degree":["MA"], "subject":["cold war", "women intellectuals"], "version":1669588060199714819}] }}

@sfarnel https://github.com/sfarnel is this the behavior that you were looking for?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/jchartrand/can-link/issues/40#issuecomment-685022782>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAEFSXJU4F6YA22WJKXT7JDSDUWTLANCNFSM4QOOMWJQ .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jchartrand/can-link/issues/40#issuecomment-685024138, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABD26TFKU7DWYBCEJPSLIETSDUW4PANCNFSM4QOOMWJQ .

-- Sharon Farnel she/her Head, Metadata Strategies University of Alberta Library sharon.farnel@ualberta.ca | 780-492-3685

The University of Alberta is situated on traditional Treaty 6 territory and homeland of the Métis peoples. Amiskwaciwâskahikan / ᐊᒥᐢᑲᐧᒋᕀᐋᐧᐢᑲᐦᐃᑲᐣ / Edmonton

jchartrand commented 3 years ago

Thanks Danoosh - that was extraordinarily helpful. It uncovered two errors on my part - I'd named Memorial as University of Newfoundland, and I'd been querying on the university name rather than the url. Your example query now returns the correct results in the web site:

image

I'm not sure if this will fix all our weird SOLR results (although, hopefully!) - if you could try other queries that would very helpful.

sfarnel commented 3 years ago

Thanks @jchartrand this looks very promising indeed. @danydvd do you think we need to expand the parameters of the initial search to look also at subjects, author, etc.? Let's talk about this.

danydvd commented 3 years ago

Thanks you @jchartrand. @sfarnel I think that would be a good since the initial search bar does not indicate any type. One thing that we should be careful about is the date (year in SOLR) that does not accept strings (only digits). If we can have a validator that can identify the type of the query so that if there is only digits, then we can search for "year" as well. Otherwise, we should not include "year" in the query parameters.

jchartrand commented 3 years ago

Year is already in there as a range, and I’m pretty sure I allow only digits, but will confirm. Yes, only allows four digits:

image
jchartrand commented 3 years ago

oh, or did you mean these should be added to the landing page, i.e., to:

image
danydvd commented 3 years ago

I think it would be good to add it to the landing page search bar. @sfarnel @CarlsoFiorention what do you think?

sfarnel commented 3 years ago

I agree; I think the initial search should be pretty open as we don't know what folks might be looking for.

CarlsoFiorention commented 3 years ago

I also agree with keeping the initial search simple and open.

On Sep 1, 2020, at 4:36 PM, Sharon Farnel notifications@github.com wrote:

I agree; I think the initial search should be pretty open as we don't know what folks might be looking for.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jchartrand/can-link/issues/40#issuecomment-685170457, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFMUPLRCXU27MITCWAIYULSDVZPJANCNFSM4QOOMWJQ.

jchartrand commented 3 years ago

If we add a number of options to the landing page (which is I think what you are suggesting?), then maybe we just end up merging the landing page and search page:

The search form from the search page would always be shown, and the university logos would appear on first load of the page, disappearing after there are search results to show?

sfarnel commented 3 years ago

I think it's more about broadening where which fields are searched when a user types in some text rather than adding extra options

CarlsoFiorention commented 3 years ago

Agree

On Sep 1, 2020, at 4:53 PM, Sharon Farnel notifications@github.com wrote:

I think it's more about broadening where which fields are searched when a user types in some text rather than adding extra options

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jchartrand/can-link/issues/40#issuecomment-685175710, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFMUPNZYKQFJ53N7NQDSHDSDV3M3ANCNFSM4QOOMWJQ.

jchartrand commented 3 years ago

Oh, now I see. So add in:

Subject Creator

Should I also check if the query is a four digit year and if so, then also search for year?

Any other fields?

sfarnel commented 3 years ago

Thanks @jc yes, let's add those for now

jchartrand commented 3 years ago

And also for the text query box on the main search page, in case someone expects that field to work the same way as the landing page field?

On Sep 1, 2020, at 7:00 PM, Sharon Farnel notifications@github.com wrote:

Thanks @jc https://github.com/jc yes, let's add those for now

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jchartrand/can-link/issues/40#issuecomment-685177780, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEFSXLZKYIQJ7W2PNR6LQDSDV4HBANCNFSM4QOOMWJQ.

sfarnel commented 3 years ago

Yes, I would think so. @CarlsoFiorention and @danydvd do you agree?

CarlsoFiorention commented 3 years ago

Yes, I agree

On Sep 1, 2020, at 5:07 PM, Sharon Farnel notifications@github.com wrote:

Yes, I would think so. @CarlsoFiorention https://github.com/CarlsoFiorention and @danydvd https://github.com/danydvd do you agree?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jchartrand/can-link/issues/40#issuecomment-685180062, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFMUPN52D5WU7VPMY55EF3SDV5DFANCNFSM4QOOMWJQ.

jchartrand commented 3 years ago

To check for years in the query string, I will:

So, as an example, if the query string is

'bicycling 1920 velodrome hamilton'

that will split into four tokens:

bicycling 1920 velodrome hamilton

which will give a query (split across lines for clarity):

(
   title:bicycling velodrome hamilton 
     OR 
   abstract:bicycling velodrome hamilton 
     OR 
   subject:bicycling velodrome hamilton 
     OR 
   creator:bicycling velodrome hamilton  
) 
AND
   year:1920 

I'll go ahead with that but if something seems off just let me know.

QUESTION: Should we also add the new department and discipline fields?

jchartrand commented 3 years ago

Another case of the devil in the details:

If there are multiple years in the query string, I'm assuming they should all be included in the query, but should they be ANDed or ORed?

I'm assuming OR'd like so:

(
   title:bicycling velodrome hamilton 
     OR 
   abstract:bicycling velodrome hamilton 
     OR 
   subject:bicycling velodrome hamilton 
     OR 
   creator:bicycling velodrome hamilton  
) 
AND
(
   year:1920 
     OR
   year:1921
)

Does that sound right?

sfarnel commented 3 years ago

Thanks James. I would think that the initial search on the landing page should also be ANDing when there are multiple terms. @CarlsoFiorention @danydvd what do you think?

jchartrand commented 3 years ago

Ah, good point.

On the other hand, many (most?) people might expect a single query box (that looks like the Google query box) to work like the Google query box, which I think ORs everything. So even the years would be OR'd in:

   title:bicycling velodrome hamilton 
     OR 
   abstract:bicycling velodrome hamilton 
     OR 
   subject:bicycling velodrome hamilton 
     OR 
   creator:bicycling velodrome hamilton  
    OR
   year:1920 
     OR
   year:1921
sfarnel commented 3 years ago

Agree that it should act like Google basic search but I believe by default that Google basic search ANDs things? I could be wrong.

CarlsoFiorention commented 3 years ago

I agree, and also the closest to Goggle behaviour the better since users will expect what they are already familiar with.

On Sep 2, 2020, at 7:12 AM, Sharon Farnel notifications@github.com wrote:

Thanks James. I would think that the initial search on the landing page should also be ANDing when there are multiple terms. @CarlsoFiorention https://github.com/CarlsoFiorention @danydvd https://github.com/danydvd what do you think?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jchartrand/can-link/issues/40#issuecomment-685726716, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFMUPLK6IUMFNGTXZ3YRA3SDZAEDANCNFSM4QOOMWJQ.

jchartrand commented 3 years ago

Here's a Google query for 'bicycling hamilton velodrome' whose results show which query terms are missing in each result (so didn't satisfy an AND):

image

To force ANDs in Google, I thought you could append '+' to each term that MUST be included:

+hamilton +cycling +velodrome

which does return more results with all three, but even then still seems to return at least one result without 'velodrome', which could be that Google is using fuzzy matching on 'velodrome' and assumes 'velo' is close enough. But, even using quotation marks to force exact search (which is I think how it works), the query still seems to return some results without the 'velodrome', although that might be that Google indexed a prior version of the page (where 'velodrome' did appear):

image
sfarnel commented 3 years ago

thanks @j ok let's leave it ORed for now and we can tweak in future

jchartrand commented 3 years ago

More devilish details!

Sometimes people may want to search for a year, not just in the 'year' field, but also within the text (title, abstract) so perhaps the query for:

bicycling velodrome hamilton 1920 1921

should be:

   title:bicycling velodrome hamilton 1920 1921
     OR 
   abstract:bicycling velodrome hamilton 1920 1921
     OR 
   subject:bicycling velodrome hamilton 1920 1921
     OR 
   creator:bicycling velodrome hamilton 1920 1921
    OR
   year:1920 
     OR
   year:1921

?

jchartrand commented 3 years ago

Query is up on the site if you want to try it.

sfarnel commented 3 years ago

Thanks James! I still think we need to AND these things. I know Google sometimes can't find all of your terms and so will bring what it finds that has most of them, but if I type Russia war into the search I am looking for both of these things rather than just one or the other.

jchartrand commented 3 years ago

I do see what you mean about that 'Russia war' query, which doesn't seem to return what you'd expect.

Drilling down into it, it gets complicated because at least one funny thing going on is that we are looking for terms in multiple fields, which we OR together:

title:russia war OR abstract:russia war OR subject:russia war OR creator:russia war

but then within each field (title, abstract, subject, creator) solr by default ORs the terms (russia OR war)

if we tried to instead AND those terms (russia AND war):

title:russia AND war OR abstract:russia AND war OR subject:russia AND war OR creator:russia AND war

then we wouldn't get what we want because we are requiring both terms in a single field (but we probably want hits where russia in the title and war is in the subject).

There might be a way to OR/AND this all together to do what you want, but I suspect it will get increasingly tricky.

However, if you can get the query to work as you'd like in the query builder, I can probably generalize that to our search field.

Query builder is here: http://206.167.181.124:8983/solr/#/test/query

jchartrand commented 3 years ago

btw, I noticed that my year 'extractor' was matching not just four digit numbers, but also 5 digit numbers, 6 digit numbers, ...

Anyhow, should be fixed now.

jchartrand commented 3 years ago

Just to confirm: we'll leave this as-is with ORs? Or look into using ANDs?

sfarnel commented 3 years ago

@jchartrand would it be possible to think holistically? That is russia war interpreted as items where russia appears in any of the indexed fields and war appears in any of the index fields, but not necessarily the same one?

jchartrand commented 3 years ago

@sfarnel I'm not sure I completely understand what you mean, but if I do, then the problem is defining the query syntax to return "russia appears in any of the indexed fields and war appears in any of the index fields, but not necessarily the same one"

which I think will get complicated.

I am happy to try to puzzle through it, but suspect it will consume a lot of time, and we may never arrive at a conclusive answer because people's expectations about search behaviour differ and sometimes even the same person will have different expectations for different situations (that would be very hard to accommodate with all sorts of probably complicated logic built into the code) Nevertheless @danydvd might have ideas about how to do it.

sfarnel commented 3 years ago

Thanks @jchartrand . Agreed. Let's leave this as-is with ORs and we can tweak in future if needed