pop-os / launcher

Modular IPC-based desktop launcher service
Mozilla Public License 2.0
228 stars 47 forks source link

feat(web): fetch and cache favicon #42

Closed oknozor closed 2 years ago

oknozor commented 2 years ago

There is bit too much unwrapping here but I am not sure everything needs to be checked. Let me know if any thing needs to be changed.

Closes #41

mmstick commented 2 years ago

It doesn't seem to be resolving URL redirects. I believe there's a configuration for that.

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/favicon.ico">here</A>.
</BODY></HTML>
oknozor commented 2 years ago

Just updated the redirect policy, I have also added some error reporting on http/fs error.

mmstick commented 2 years ago

Perhaps we should validate that the data we downloaded is an actual image file.

$ file --mime-type Search.ico
Search.ico: image/vnd.microsoft.icon

image/png and image/gif should also be valid outputs.

oknozor commented 2 years ago

I implemented this based on the file mime path. Do we want to reject the icon based on the content-type header instead ?

mmstick commented 2 years ago

We can. I don't think guessing based on extension is reliable. The file command checks magic numbers.

oknozor commented 2 years ago

Just updated I have not tested all web rules though

mmstick commented 2 years ago

Be sure to update the debian/control to have a build-depend of libssl-dev on the source, and depend of libssl1.1 on the package.

oknozor commented 2 years ago

Done

mmstick commented 2 years ago

It could fetch the icon in the background (as a {filename}.partial) while returning the search result early. Then the launcher would have the option of showing no icon if it wasn't fetched in time.

oknozor commented 2 years ago

This breaks bc/bandcamp search (the prefix no longer works), with this showing up in journalctl when I type that prefix:

Seems the problem is that bandcamp icon is located at https://s4.bcbits.com/img/favicon/favicon-16x16.png, The same goes for flathub and other broken endpoints, though I am not sure yet why the error message is different.

I guess we should not expect other domains to always expose favicon at {domain}/favicon.ico. There is no default url scheme for favicons. Browsers use the page source to determine the favicon location (https://www.w3.org/2005/10/howto-favicon)

That said, there is a google endpoint which could solve this for us : https://www.google.com/s2/favicons?domain={domain}&sz={icon_size}. I was not able to find any official documentation on this but is seems to be widely used. There are lots stackoverflow post and blogs article talking about this endpoint.

Attempting to search the Pop!_Planet forums with pp causes the launcher's results to stop responding to input until it's closed

This url https://pop-planet.info/favicon.ico, responds after one minute with a 504 Gateway Time-out. We should definitelty add a shorter time-out on our side.

It could fetch the icon in the background (as a {filename}.partial) while returning the search result early. Then the launcher would have the option of showing no icon if it wasn't fetched in time.

@mmstick I am not familiar with smol but I gess it work quite like tokio tasks ?

Tell me what you think about the google favicon endpoint, if its ok for you I will rewrite this as a background task querying for the favicon (if it is not cached already) via the google api.

mmstick commented 2 years ago

Yes, smol::spawn is similar to tokio::spawn. smol uses a global async_executor::Executor for spawning tasks onto, which is a single thread by default.

oknozor commented 2 years ago

Yep, I have implemented the solution above, it seems to work fine using the google api.

jacobgkau commented 2 years ago

The loading is working well overall now. However, the GitHub icon for gh and gist is not showing up with the new API:

Screenshot from 2021-10-14 12-58-53

mmstick commented 2 years ago

Could use scraper to locate the <link rel="icon" ... href="{URL_TO_FAVICON}"> on the page. GitHub has both a SVG and PNG (as a alternate-icon link) favicon.

oknozor commented 2 years ago

I tried yo use scraper, it uses Cell internally :

        smol::spawn(async move {
            let response = client
                .get_async(url)
                .await;

            let html = response.unwrap().text().await.unwrap();
            let html = Html::parse_document(&html);
            let link_selector = Selector::parse(r#"link[rel="shortcut icon"]"#).unwrap();
            let link = html.select(&link_selector).next().unwrap();
            let icon_url = link.value().attr("href").unwrap();
// ... 

This produces the following error :

error: future cannot be sent between threads safely
   --> plugins/src/web/mod.rs:145:9
    |
145 |         smol::spawn(async move {
    |         ^^^^^^^^^^^ future created by async block is not `Send`
    |
    = help: within `ego_tree::Node<Node>`, the trait `Sync` is not implemented for `Cell<NonZeroUsize>`
note: future is not `Send` as this value is used across an await
   --> plugins/src/web/mod.rs:183:32
    |
153 |             let link = html.select(&link_selector).next().unwrap();
    |                 ---- has type `ElementRef<'_>` which is not `Send`
...
183 |                     let copy = smol::fs::write(&favicon_path, contents).await;
    |                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ await occurs here, with `link` maybe used later
...
190 |         })
    |         - `link` is later dropped here
note: required by a bound in `smol::spawn`
   --> /home/okno/.cargo/registry/src/github.com-1ecc6299db9ec823/smol-1.2.5/src/spawn.rs:30:67
    |
30  | pub fn spawn<T: Send + 'static>(future: impl Future<Output = T> + Send + 'static) -> Task<T> {
    |                                                                   ^^^^ required by this bound in `smol::spawn`

Maybe I can manually parse the href ? This would be trivial to implement I think.

oknozor commented 2 years ago

There is also select, I'll give a shot.

Edit: It uses !Sync types to

mmstick commented 2 years ago

It should be fine if it is !Sync or !Send as long as its values aren't alive beyond an await point.

oknozor commented 2 years ago

It should be fine if it is !Sync or !Send as long as its values aren't alive beyond an await point.

Oops I have just finished a scraper free implementation. Shall I try again with scraper ?

Anyway, I have kept the google favicon as a fallback because google.com seems to set its favicon via javascript, making it impossible to get the icon from the html source.

mmstick commented 2 years ago

If it works, it's fine. And it has a test. Less dependency.

jacobgkau commented 2 years ago

ahi/alie queries are now showing a blank icon:

Screenshot from 2021-10-15 12-57-56

oknozor commented 2 years ago

This is probably because the icon href start with a double slash on aliexpress :

<link rel="shortcut icon" type="image/x-icon" href="//ae01.alicdn.com/images/eng/wholesale/icon/aliexpress.ico" />

I will fix this tomorrow

oknozor commented 2 years ago

@jacobgkau this should be working for all urls now :crossed_fingers:

jacobgkau commented 2 years ago

While testing this earlier, I thought I saw the AliExpress icon working. However, now it's not working again, still on 7180df4. I'm getting this message in journalctl:

Oct 18 12:16:02 pop-os gnome-shell[17112]: Oct 18 12:16:02.375 ERROR pop_launcher_plugins::web: Got unexpected content-type 'text/html;charset=UTF-8' type for "/home/system76/.cache/pop-launcher/AliExpress.ico" favicon

Also, while troubleshooting this, I found that if I remove ~/.cache/pop-launcher/ and don't restart GNOME Shell, the web plugin stops working until I restart GNOME Shell:

Oct 18 12:17:05 pop-os gnome-shell[17551]: thread 'main' panicked at 'unable to create $HOME/.cache/pop-launcher: Os { code: 2, kind: NotFound, message: "No such file or directory" }', plugins/src/web/mod.rs:65:41
oknozor commented 2 years ago

@jacobgkau I made a mistake, ali express uses a // prefix for the favicon href, its an url without the protocol, not an uri as I assumed first. It shall be fixed now.

I have also added a check to recreate the cache directory at runtime if it gets removed.

mmstick commented 2 years ago

Best to go with a crate that has figured HTML parsing out.

jacobgkau commented 2 years ago

ali express uses a // prefix for the favicon href, its an url without the protocol

No worries, my understanding is that // means "use whatever protocol was used to access the webpage," so a browser will use http:// or https:// depending on whether the user connected to the site via http:// or https://.

Best to go with a crate that has figured HTML parsing out.

I was just minutes away from approving this, all icons are working now, but we can hold off if you'd rather this be done differently.

oknozor commented 2 years ago

@mmstick, actually the trial and error came from url schemes, not html parsing. If you still want to use a dedicated crate to handle this, I will do it.

mmstick commented 2 years ago

If it works, it's fine for now.