ssborbis / ContextSearch-web-ext

Search engine manager for modern browsers
349 stars 37 forks source link

Automatically load title links after search #592

Closed Parvares closed 1 year ago

Parvares commented 1 year ago

Hi Mike, I'm trying to use your extension with this template URL:

https://www.bibliotechediroma.it/opac/query/%s?context=tmatm

(where %s is to be replacede by the search key, for example Stephen King). I would like all the 15 resulting titles/records to be automatically opened after my search, so to have quickly the detailed records. Could you help me if possible (I guess with a Post-Search Script)? Thanks very much!

ssborbis commented 1 year ago

Something like...

document.querySelectorAll(".titololistarisultati a[href][title]").forEach( a => {
    window.open(a.href,"_blank");
});
Parvares commented 1 year ago

Great, really useful for me, thanks very much!!! Is there a way to gather all the resulting tabs/pages in only one long page? Would be perfect!

ssborbis commented 1 year ago

What part of the results page are you wanting on one page?

Parvares commented 1 year ago

Possibly a web page with 15 records for every page... Thanks!

ssborbis commented 1 year ago
( async() => {

  let links = [...document.querySelectorAll(".titololistarisultati a[href][title]")];

  for ( let i in links ) {
      let link = links[i];

      let f = document.createElement('iframe');
      f.style.display = 'none';
      document.body.appendChild(f);
      f.onload = async () => {
        await new Promise(r => setTimeout(r, 100));
        f.contentWindow.document.getElementById("tabDocument_item_tabcata").click();
        await new Promise(r => setTimeout(r, 100));
        let tab = f.contentWindow.document.getElementById("tabcata");
        let content = tab.outerHTML;
        document.querySelector(`#listadocumenti_${i}`).innerHTML = content;     
      }

      f.src = link.href;
      await new Promise(r => setTimeout(r, 100));
    }
})();

Be aware, it takes time to load all those iframes

Parvares commented 1 year ago

Incredible, works like a charm, thanks really so much! Is it possible to have the same result when I pass from page 1 (with 15 results) to page 2 (with other 15 results), to page 3, and so on? I've tried right clicking on page 2 using your extension, the detailed records still open in one page, but they are not related with the search term... Thanks again!

ssborbis commented 1 year ago

The Post-Search script will only run once, on the results landing page ( page 1 of results ). To run the script anywhere else, you might try copying it to a new Script-type engine, and click that engine when you go to the next page. Or, bind a hotkey to that engine and just press a button to run the script after you navigate to the next page.

ssborbis commented 1 year ago

This may be a task better suited for tampermonkey. I'm guessing you could reuse that script and have it run on every results page

Parvares commented 1 year ago

Thanks Mike, should I fill the metadata 1-10? Thanks again!

ssborbis commented 1 year ago

Should I fill the metadata

You should definitely restrict it to running on https://www.bibliotechediroma.it/opac/query/*

ssborbis commented 1 year ago

It worked for me using

// ==UserScript==
// @name         New Userscript
// @namespace    http://tampermonkey.net/
// @version      0.1
// @description  try to take over the world!
// @author       You
// @match        https://www.bibliotechediroma.it/opac/query/*
// @icon         https://www.google.com/s2/favicons?sz=64&domain=bibliotechediroma.it
// @grant        none
// ==/UserScript==

(function() {
    'use strict';

    ( async() => {

        let links = [...document.querySelectorAll(".titololistarisultati a[href][title]")];

        for ( let i in links ) {
            let link = links[i];

            let f = document.createElement('iframe');
            f.style.display = 'none';
            document.body.appendChild(f);
            f.onload = async () => {
                await new Promise(r => setTimeout(r, 100));
                f.contentWindow.document.getElementById("tabDocument_item_tabcata").click();
                await new Promise(r => setTimeout(r, 100));
                let tab = f.contentWindow.document.getElementById("tabcata");
                let content = tab.outerHTML;
                document.querySelector(`#listadocumenti_${i}`).innerHTML = content;
            }

            f.src = link.href;
            await new Promise(r => setTimeout(r, 100));
        }
    })();
})();

I also set the following image

ssborbis commented 1 year ago

If you use that script, remove the Post-Search Script from that engine

ssborbis commented 1 year ago

Actually, I take that back. I'm not getting the changes when going to another page. You might need more code for that

Parvares commented 1 year ago

Okay, thanks again, now it works with Tampermonkey following your settings. I noticed the script only works with your extension, is it possible let the script work with and without your extension (directly in the search engine)?

ssborbis commented 1 year ago

Try this in tampermonkey

// ==UserScript==
// @name         New Userscript
// @namespace    http://tampermonkey.net/
// @version      0.1
// @description  try to take over the world!
// @author       You
// @match        https://www.bibliotechediroma.it/opac/query/*
// @icon         https://www.google.com/s2/favicons?sz=64&domain=bibliotechediroma.it
// @grant        none
// ==/UserScript==

(function() {
    'use strict';

    let url = window.location.href;

   const main = async() => {

        let links = [...document.querySelectorAll(".titololistarisultati a[href][title]")];

       let startIndex = (() => {
           let id = document.querySelector(".list-document > LI").id;
           return parseInt(id.match(/listadocumenti_(\d+)/)[1]);
       })();

       //console.log('start index', startIndex);

        for ( let i in links ) {
            try {
                let link = links[i];
                let index = parseInt(i) + parseInt(startIndex);

                //console.log("index: " + index, link.href);

                let f = document.createElement('iframe');
                f.style.display = 'none';
                document.body.appendChild(f);
                f.onload = async () => {
                    try {
                    await new Promise(r => setInterval(() => {
                        if ( f.contentWindow.document.getElementById("tabDocument_item_tabcata") )
                            r();
                    }, 100));

                    f.contentWindow.document.getElementById("tabDocument_item_tabcata").click();
                    await new Promise(r => setTimeout(r, 100));
                    let tab = f.contentWindow.document.getElementById("tabcata");
                    let content = tab.outerHTML;
                    document.querySelector(`#listadocumenti_${index}`).innerHTML = content;
                    } catch (error) { console.log(error)}
                }

                f.src = link.href;
            } catch (error) {
                console.log(error);
            }

            await new Promise(r => setTimeout(r, 100));
        }
    }

   main();

   setInterval(() => {
       if ( window.location.href !== url ) {
           //console.log('href change');
           url = window.location.href;
           main();
       }
   }, 1000);

})();
Parvares commented 1 year ago

Absolutely astonishing, thanks so much, Mike, it's all working perfectly!! P.S. Is this warning important in line 44? eslint: curly - expected { after 'if' condition

ssborbis commented 1 year ago

Is this warning important in line 44?

Nah, just a style choice that particular linter config doesn't like. You could rewrite that line as the following and get rid of the warning, but it makes no difference in the code.

if ( f.contentWindow.document.getElementById("tabDocument_item_tabcata") ) {
    r();
}
Parvares commented 1 year ago

Great, thanks again for you time!!

Parvares commented 1 year ago

Hi Mike, may I ask a little variotion on this script? Is it possible to have the same kind of list (15 records per page), but getting from the single records not the tab "Scheda" as in the above script, but the tab "Lo trovi in", possibly opening the label "Biblioteca Morante"? Thanks ever so much!

ssborbis commented 1 year ago

The difference would be with this line

f.contentWindow.document.getElementById("tabDocument_item_tabcata").click();

That's where the Scheda tab is clicked. Simply omit that line or comment it out with // to see the Lo Trovi in tab, since it's the default shown. Clicking the "Biblioteca Elsa Morante" would be done in the same manner. You just need to replace tabDocument_item_tabcata" with the proper selector. You can inspect the DOM in the browser with CTRL+SHIFT+I to find a selector that works ( usually the id of the element to be clicked ) . I'll check myself later if you haven't found a solution.

Parvares commented 1 year ago

Mhm, thank you, but didn't solve.

ssborbis commented 1 year ago

Do you have a link to a page with a "Biblioteca Elsa Morante" tab? I'm not sure what search would get me a link that does.

ssborbis commented 1 year ago

Nevermind, I found it.

Try this selector

document.querySelector('#biblioteche [onclick*="Morante"]').click();
Parvares commented 1 year ago

Sorry Mike, the result is the same.

ssborbis commented 1 year ago

I'll check it out. It's the holidays here so I've been preoccupied.

ssborbis commented 1 year ago

I'm really not sure what you're trying to achieve, but this script will "click" a Biblioteca Elsa Morante tab, if one exists, and display that instead of the Scheda tab

// ==UserScript==
// @name         New Userscript
// @namespace    http://tampermonkey.net/
// @version      0.1
// @description  try to take over the world!
// @author       You
// @match        https://www.bibliotechediroma.it/opac/query/*
// @icon         https://www.google.com/s2/favicons?sz=64&domain=bibliotechediroma.it
// @grant        none
// ==/UserScript==

(function() {
    'use strict';

    let url = window.location.href;

   const main = async() => {

        let links = [...document.querySelectorAll(".titololistarisultati a[href][title]")];

       let startIndex = (() => {
           let id = document.querySelector(".list-document > LI").id;
           return parseInt(id.match(/listadocumenti_(\d+)/)[1]);
       })();

       //console.log('start index', startIndex);

        for ( let i in links ) {
            try {
                let link = links[i];
                let index = parseInt(i) + parseInt(startIndex);

                //console.log("index: " + index, link.href);

                let f = document.createElement('iframe');
                f.style.display = 'none';
                document.body.appendChild(f);
                f.onload = async () => {
                    try {
                        await new Promise(r => setInterval(() => {
                            if ( f.contentWindow.document.getElementById("tabDocument_item_tabcata") )
                                r();
                        }, 100));

                        let tab;

                        let morante = f.contentWindow.document.querySelector('#biblioteche [onclick*="Morante"]');

                        if ( morante ) {
                            morante.click();
                            await new Promise(r => setTimeout(r, 100));
                            tab = f.contentWindow.document.getElementById("tabloca");
                        } else {
                            f.contentWindow.document.getElementById("tabDocument_item_tabcata").click();
                            await new Promise(r => setTimeout(r, 100));
                            tab = f.contentWindow.document.getElementById("tabcata");
                        }

                        let content = tab.outerHTML;
                        document.querySelector(`#listadocumenti_${index}`).innerHTML = content;
                    } catch (error) { console.log(error)}
                }

                f.src = link.href;
            } catch (error) {
                console.log(error);
            }

            await new Promise(r => setTimeout(r, 100));
        }
    }

   main();

   setInterval(() => {
       if ( window.location.href !== url ) {
           //console.log('href change');
           url = window.location.href;
           main();
       }
   }, 1000);

})();
Parvares commented 1 year ago

Thanks for your reply, Mike, and thanks for your time! The results are a bit confusing, even filtering upstream by library and type of resource ("Libri moderni", so avoiding ebooks, that are without localization).

ssborbis commented 1 year ago

If you're just wanting the listings of the original search page replaced with the content of the result page links, that's easy enough. The problem is, the "tabs" will most likely not be clickable because the events won't be copied when displayed on the original results page. So you either need to "click" them programmatically before displaying them, or load each result link into an iframe and display the iframe on the original results page. Again, I don't know what you're doing after getting the results, so I can't say which approach is better.

Parvares commented 1 year ago

Hi, I need the title (doesn't matter the abstract) and the inventory part of the library for every 15 records Thanks again!

ssborbis commented 1 year ago

From the queries I've run, not every result has a "Biblioteca Morante" tab, so how would the results look then?

Parvares commented 1 year ago

Doesn't matter, moreover I can filter (upstream or downstream) the results by library.

ssborbis commented 1 year ago

I tested the iframe method and it doesn't seem to work. The host site throws some errors, so you might be stuck copying the DOM and pasting it into the results page.

Are you just trying to get this content to display on the main results page?

Inventario 58452
Collocazione ADO  813.5        KIN
ssborbis commented 1 year ago

image

// ==UserScript==
// @name         New Userscript
// @namespace    http://tampermonkey.net/
// @version      0.1
// @description  try to take over the world!
// @author       You
// @match        https://www.bibliotechediroma.it/opac/query/*
// @icon         https://www.google.com/s2/favicons?sz=64&domain=bibliotechediroma.it
// @grant        none
// ==/UserScript==

// https://www.bibliotechediroma.it/opac/query/stephen%20king?context=tmatm

(function() {
    'use strict';

    let url = window.location.href;

   const main = async() => {

        let links = [...document.querySelectorAll(".titololistarisultati a[href][title]")];

       let startIndex = (() => {
           let id = document.querySelector(".list-document > LI").id;
           return parseInt(id.match(/listadocumenti_(\d+)/)[1]);
       })();

       //console.log('start index', startIndex);

        for ( let i in links ) {
            try {
                let link = links[i];
                let index = parseInt(i) + parseInt(startIndex);

                //console.log("index: " + index, link.href);

                let f = document.createElement('iframe');
                f.style.display = 'none';
                document.body.appendChild(f);
                f.onload = async () => {
                    try {
                        await new Promise(r => setInterval(() => {
                            if ( f.contentWindow.document.getElementById("tabDocument_item_tabcata") )
                                r();
                        }, 100));

                        let tab;
                        let inventario;

                        let morante = f.contentWindow.document.querySelector('#biblioteche [onclick*="Morante"]');

                        if ( morante ) {
                            morante.click();
                            await new Promise(r => setTimeout(r, 100));
                            tab = f.contentWindow.document.querySelector('.inventario');
                        } else {
                            return;
                           // f.contentWindow.document.getElementById("tabDocument_item_tabcata").click();
                          //  await new Promise(r => setTimeout(r, 100));
                          //  tab = f.contentWindow.document.getElementById("tabcata");
                        }

                        let li =  document.querySelector(`#listadocumenti_${index}`)

                        let content = tab.outerHTML;
                        li.appendChild(tab);
                        //li.innerHTML = content;
                       //li.style.borderBottom = 'none';

                    } catch (error) { console.log(error)}
                }

                f.src = link.href;
            } catch (error) {
                console.log(error);
            }

            await new Promise(r => setTimeout(r, 100));
        }
    }

   main();

   setInterval(() => {
       if ( window.location.href !== url ) {
           //console.log('href change');
           url = window.location.href;
           main();
       }
   }, 1000);

})();
Parvares commented 1 year ago

Wao, thanks, the results seem almost perfect! Yes, that was what I expected, except the strange fact that not every record display the iframe. Thanks again!

ssborbis commented 1 year ago

except the strange fact that not every "Biblioteca Morante" record display the iframe. Isn't it there a workaround that comes to your mind? Thanks again!

Do you have an example URL I can look at?

Parvares commented 1 year ago

Do you have an example URL I can look at?

Sorry, what do you mean? Some record that doesn't display the iframe?

ssborbis commented 1 year ago

If you perform a search and see a results page where not every Biblioteca Morante result is displayed properly, copy the URL and and paste it here so I can see what you're seeing.

Parvares commented 1 year ago

Try with:

https://www.bibliotechediroma.it/opac/query/stephen%20king?bib=RMBO2&context=tmatm

ssborbis commented 1 year ago

Gotcha. Oddly, it works fine for me

ssborbis commented 1 year ago

Ah, it looks like some links open to this tab Contiene and mess up the script.

This script would run much faster if you only checked links that show Biblioteca Elsa Morante in the Lo Trovi In list on the results page.

Parvares commented 1 year ago

I noticed that the script works much better with Firefox. Searching for "Stephen King" and filtering by RMBO2I get 4 pages, and only 4 records without iframe, not perferct but very near, congrats Mike!!!

ssborbis commented 1 year ago

I noticed that the script works much better with Firefox. Searching for "Stephen King" and filtering by RMBO2 I get 4 pages, and only 4 records without iframe, not perferct but very near, congrats Mike!!!

It may never work 100% of the time due to the fact it's loading every result in an iframe, but I'm sure the code can be tweaked a bit. One problem is the linked pages aren't consistent in how they display information. I think one bug happens when there is only one Lo Trovi In. There are others too that can be accounted for, given enough code.

Parvares commented 1 year ago

Yes, the major bug seems to be when there's only one library inLo Trovi in. It would be great if you could find a way to get over it! Thanks again!

ssborbis commented 1 year ago

Do you really need this info for every result, or only a few? It may be a lot faster to create a button to click to load the info for only the results you need, vs loading every link into an iframe

Parvares commented 1 year ago

Yes, generally I need this info for almost every result (except the records with contiene tab), as I need to compare them through a panoramic view. As to the other method, if I understood I would need to click 15 records per page before displaying the related tabs, is that right? Maybe if I see an example I I would better realize. Thank you again, Mike!

ssborbis commented 1 year ago

This seems to give the best results so far.

// ==UserScript==
// @name         bibliotechediroma
// @namespace    http://tampermonkey.net/
// @version      0.2
// @description  try to take over the world!
// @author       You
// @match        https://www.bibliotechediroma.it/opac/query/*
// @icon         https://www.google.com/s2/favicons?sz=64&domain=bibliotechediroma.it
// @grant        none
// ==/UserScript==

// https://www.bibliotechediroma.it/opac/query/stephen%20king?context=tmatm

(function() {
    'use strict';

    let url = window.location.href;

   const main = async() => {

       await new Promise(r => setTimeout(r, 500));

       if ( window != top ) return;

       let links = [...document.querySelectorAll(".titololistarisultati a[href][title]")];

       let startIndex = (() => {
           let el = document.querySelector(".list-document > LI");

           if ( ! el ) return -999999;

           let id = el.id;

           if ( !id ) return -999999;

           let num = id.match(/listadocumenti_(\d+)/)[1];

           return parseInt(num);
       })();

        for ( let i=0;i<links.length;i++ ) {

            try {
                let link = links[i];
                let index = i + startIndex;

                let id = `#listadocumenti_${index}`;

                let li = document.querySelector(id);

                let container = document.createElement('div');
                container.style = 'margin-left:16px';
                container.innerText = " [ loading ] ";

                container.onclick = () => { f.src = f.src }

                li.appendChild(container);

                let f = document.createElement('iframe');
                f.style.display = 'none';
                document.body.appendChild(f);

                f.onload = async () => {

                    if ( f.src === null ) return;

                    try {
                        await new Promise(r => setInterval(() => {
                            if ( f.contentWindow.document.getElementById("tabDocument_item_tabloca") )
                                r();
                        }, 100));

                        f.contentWindow.document.getElementById("tabDocument_item_tabloca").click();

                        await new Promise(r => setTimeout(r, 100));

                        let morante = f.contentWindow.document.querySelector('#biblioteche [onclick*="Morante"]');

                        if ( morante ) morante.click();

                        await new Promise(r => setInterval(() => {
                            if ( f.contentWindow.document.querySelector('.inventario') )
                                r();
                        }, 100));

                        let tab = f.contentWindow.document.querySelector('.inventario');

                        let content = tab.outerHTML;
                        li.appendChild(tab);
                        container.parentNode.removeChild(container);

                    } catch (error) {
                        container.innerText = " [ failed ] ";
                        console.log(error)
                    }

                    f.src = null;
                }

                f.onerror = () => {
                    container.innerText = " [ failed ] ";
                    f.src = null;
                }

                f.src = link.href;
            } catch (error) {
                console.log(error);
            }

            await new Promise(r => setTimeout(r, 100));
        }
    }

   main();

   setInterval(() => {
       if ( window.location.href !== url ) {
           //console.log('href change');
           url = window.location.href;
           main();
       }
   }, 1000);

})();
Parvares commented 1 year ago

Thanks Mike, the results are better but seem slower than the other script (on Chrome too)... Thanks again!

Parvares commented 1 year ago

Hi Mike, I installed TamperMonkey extension on Android (I tried both Kiwi Browser and Firefox Nightly). I can't get the script working , should I set something in Tampermonkey Settings? Thanks!

ssborbis commented 1 year ago

The last script I posted works for me in Firefox Nightly on Android. I haven't checked Kiwi

(edit) Kiwi worked fine too

Parvares commented 1 year ago

Sorry, Mike, it works indeed with both browsers (better on Kiwi), maybe I did some pasting error, thank you!