ssborbis / ContextSearch-web-ext

Search engine manager for modern browsers
321 stars 36 forks source link

Partially exclude search results #583

Closed Parvares closed 1 year ago

Parvares commented 1 year ago

Hi Mike, could you help me? I'm trying to use your extension with this query:

https://www.bibliotechediroma.it/opac/query/MAT:DVD%20%s?bib=RMBO2&context=tdoccd

(MAT:DVD is the tipology of resourse, RMBO2 stands for Library-Biblioteca and tdoccd stands for audiovisual)

If I search by genre with the term "avventura" (i.e. "adventure"), like this:

https://www.bibliotechediroma.it/opac/query/MAT:DVD%20avventura?bib=RMBO2&context=tdoccd

I would have 141 results, but they are more than the they should, so I would like to find only the results that always have the search term ("avventura") in the tab "Lo trovi in" > Biblioteca Morante > Collocazione: [...] AVVENTURA [...]

as in this case, for example:

https://www.bibliotechediroma.it/opac/resource/i-goonies/RMB0442732

so have to be excluded the results that are only in the tab "scheda" or in the above abstract, but not in the tab "Lo trovi in", as in this case:

https://www.bibliotechediroma.it/opac/resource/avventura-tra-i-ghiacci-videoregistrazione/RMB0456526

Thanks very much, your assistance would be much appreciated!

ssborbis commented 1 year ago

Let me see If I understand you correctly.

You are searching for a genre but are also getting results where the genre appears in the title but are not in the genre you originally search for, and want to exclude those results? Or do I have that backwards?

Parvares commented 1 year ago

Yep, exactly, I would like to exclude search results coming from the title, the abstract and in general all the page related to tab “Scheda”, while I would like to get only search results with the “search term” placed in the tab “Lo trovi in” > "Collocazione... ", as in the screenshot I’m attacching… P.S. Obviously when the search term is in both tabs ("Scheda" and "Lo trovi in") the results are to be included.

Desired search results

ssborbis commented 1 year ago

There's an advanced search button on that page that allows you to exclude terms from a number of categories. You could use those filters to exclude results that include the search terms in, for instance, the title. Of course, any result that matched the genre, but also included the search term in the title ( Adventures in Babysitting for example ) would be excluded.

A more thorough, but complicated method would be to use a web scraping tool. You could use the external application launcher feature + CS native app to send your search terms to the scraper. Although, I don't know what you're doing with the results.

Since the genre listed under Scheda doesn't seem to be knowable until you open each individual result URL, you'd need to collect the URLs of every result, open each link, and look for the searched-for genre in the genre section.

Parvares commented 1 year ago

I had tried the advanced search excluding search term in the title, but it has to be combined with a full search ("ricerca libera") to get the results from tab "Lo trovi in", and this combination would be contradictory and doesn't work... As for the other procedure unfortunately it would not simplify the search, since I need a single search... I hoped it was easier to reach the goal, thanks very much anyway!

ssborbis commented 1 year ago

As for the other procedure unfortunately it would not simplify the search, since I need a single search... I hoped it was easier to reach the goal, thanks very much anyway!

It would be a single search. You just need to write a script to do the work. (edit) and it wouldn't be particularly fast

Parvares commented 1 year ago

Ok, but I was aiming to reach a fast single list with a single search... :-(