will-lumley / FaviconFinder

A small swift library for iOS & macOS to detect favicons used by a website.
MIT License
161 stars 33 forks source link

Recursive loop causes on Linux #102

Open mergesort opened 2 weeks ago

mergesort commented 2 weeks ago

I haven't yet been able to figure out the root of the issue, but I continue to experience crashes on Linux for certain websites. I am somewhat confident that the source of the issue is here, in the fetchFaviconURLs()(https://github.com/will-lumley/FaviconFinder/blob/b31e0c1f2e69f577abd1d90f7da58d6540c2da60/Sources/FaviconFinder/FaviconFinder.swift#L60-L114) function.

What I'm seeing in the logs is something like this.

Using source: [html]
Failed to find Favicon [failedToFindFavicon]. Trying next source type.
Using source: [ico]
[ ERROR ] Failed to save link metadata [error: Error Domain=NSURLErrorDomain Code=-1001 "(null)", request-id: 3DCA6B6E-7C5D-4F39-B4CF-F7A8F8273F0B, url: https://www.caranddriver.com/news/a62595445/2024-tesla-model-3-quieter-more-highway-range-tested]

And then the server crashes shortly after. I think this may be related to a memory issue, possibly because of the way the webpage is constructed, and possibly because of the recursive loop that fetchFaviconURLs() contains.

I can reproducibly make that website (https://www.caranddriver.com/news/a62595445/2024-tesla-model-3-quieter-more-highway-range-tested) crash, as well as https://www.prevention.com/health/a46107400/types-of-magnesium/#. I may be able to scrounge up a few more links that have failed and caused crashes if you need, but the key is to run this in a Linux environment, such as a Docker image or a server.

Please let me know if there's any other way I can help, this is unfortunately remains a pretty serious issue for me and I'm more than happy to do what I can do to help.

mergesort commented 2 weeks ago

I noticed a couple of questionable snippets of code which I think may be related to the bug I'm experiencing, but I may not be familiar enough with the code to understand if these cause the issue or not.


In FaviconURLSession, both linuxDataTask and appleDataTask can recurse infinitely. If a website employs multiple or cyclic meta-refresh redirects, this could lead to unbound recursion.

I think you would need to change the functions to have a shape like this, where there is a count of the current redirects, and a max redirect count, to prevent overflow scenarios.

static func apple/LinuxDataTask(
    with url: URL,
    checkForMetaRefreshRedirect: Bool = false,
    httpHeaders: [String: String?]? = nil,
    redirectCount: Int = 0,
    maxRedirects: Int = 5
) async throws -> Response {

I also noticed this code in ICOFaviconFinder, and I want to double-check my assumptions.

Based on these comments

// We couldn't find any image, so let's try the root domain (just in case it's hiding there)
// ie. If we couldn't find the image at "subdomain.google.com/favicon.ico", let's try "google.com/favicon.ico"

Should the code actually use rootURL like this, rather than faviconURL?

let baseFaviconUrlData = try await FaviconURLSession.dataTask(
    with: rootURL,
    checkForMetaRefreshRedirect: self.configuration.checkForMetaRefreshRedirect
).data

Hope this makes sense, and helps!

will-lumley commented 2 weeks ago

Thanks for flagging this @mergesort - I'll be looking into this immediately and update you when I have a fix.

mergesort commented 1 week ago

I was just wondering @will-lumley, do you happen to have an update on this?

will-lumley commented 1 week ago

@mergesort Planning on releasing the fix later tonight :) (I'm in Sydney if that helps)

mergesort commented 1 week ago

That's amazing timing, thank you so much! (I didn't know that but I hope you like it there. 🙂)

mergesort commented 5 days ago

Hey @will-lumley, any update on the new release?

will-lumley commented 4 days ago

@mergesort - I had unexpected family commitments come out of nowhere this weekend - sorry for not getting back to you sooner. So I booted up my Ubuntu machine and ran FaviconFinder with your URLs, but much to my surprise they worked fine. I tried both of them several times but it worked as expected each time.

Regardless, both of your code suggestions are valid to be input - so I've implemented them (copy/pasted them) into the library under the branch bug/recursive-loop. Can you switch to this branch and tell me if you still experience this issue? If you still are, can you give me the exact specific linux distro/build/etc you're using so I can try and replicate your issue?

Keen to solve this bug :)