lurado / SansFonts

WebKit Content Blocker to block web fonts in Safari (iOS & macOS)
25 stars 2 forks source link

Ideas for making this project sustainable #37

Open jlnr opened 8 years ago

jlnr commented 8 years ago

This project cannot really work if I have to patch it for every single site that breaks without its custom icon font.

Here are some ideas for improving this process:

β€’ Maintain the list of exceptions in a YAML file that contains the pattern(s), a comment (which JSON does not allow), and maybe a URL for re-evaluating the exception later. β€’ A tiny Ruby build tool then checks which rules are outdated. β€’ And most importantly, the tool should also crawl the top 10.000 domains or so, and see if example.com uses a font called example.ttf, which is then very likely to be the custom icon font for this website - and add an exception for this.

Also, replacing icon fonts with SVG files is a thing now: https://sarasoueidan.com/blog/icon-fonts-to-svg/ πŸŽ‰ πŸŽ‰ πŸŽ‰

pocketarc commented 7 years ago

I think this would be a terrific idea β€” the "re-evaluating the exception" would actually be fantastic to do with Travis in this repo; you could have all URLs being re-evaluated on every commit to make sure that nothing we do ever breaks an existing exception.

Speaking of using YAML, have you considered having SansFont's iOS (and later the OS X app, when #38 is done) fetch the updated YAML file from the repo every day or so, so that you don't need to worry about releasing new versions just to update compatibility?

jlnr commented 7 years ago

About downloading new definitions – I think that this will actually make the update process more fragile. Right now, if I push an update, most people will automatically receive it, and the OS will hopefully reload the JSON from inside the app bundle (I should probably verify this with the next update).

In contrast, to load updates from within the app, people would have to open it. I don't think people spend a lot of time in the Sans Fonts apps, and so they'd be stuck with outdated definitions forever.

To automate this, I should really set up Fastlane so I can release new versions with fastlane mac or fastlane ios from the command line.

pocketarc commented 7 years ago

Well, you're right there! If you can automate pushing a new release that'd take care of the issue, absolutely. I didn't even think about that. Definitely worth doing it that way if it's not more hassle for you.

jlnr commented 7 years ago

I've put a basic mechanism for regenerating blockerList.json in place (just a short page of Ruby).

To completely automate this extension, a little macOS app would probably work best. It'd scrape the top 1000 pages off Hacker News/Alexa/whatever, intercepting resource loads, and then adding exceptions for icon fonts based on a few heuristics.

pocketarc commented 7 years ago

Nice.

If you're just doing scraping then you don't really need a macOS app, I imagine. Just a command-line tool, no? So we'd run ./scrape and off it would go, finding any necessary exceptions and either committing them directly or creating them as PRs in GH. Heck, I could even build that remotely and have a cron service run it every other day or so, so you'd just get a bunch of PRs and just have to approve them.

Having said that though, I think the most pressing issue for sustainability might not necessarily be the blockerList.json but the release system - if, whenever you merge a PR, a new release was triggered (you talked about automating this in the past), that would be fantastic, because otherwise the project depends on you having enough time to continuously push releases, which makes it more of a hassle for you than it needs to be.

Also, I did not know you had released the Mac app! That's awesome; goodbye extension! πŸŽ‰

jlnr commented 7 years ago

Hah, touchΓ©. I've run into certificate trouble while releasing 1.5 just now, so I am indeed the bottleneck. I'll set up fastlane later today, which is the first step towards automating this thing.

As for the scraper: My thinking here was that a "real" WebView would correctly trigger all the delayed loading on modern websites, passing each resource to its delegate (the scraper) as it is requested. That would make it relatively easy to get at the filenames of all referenced web fonts. I am not sure if that works reliably when using a non-visual command-line scraping library. I'd like to avoid parsing CSS files, for example.

Are you thinking of any scraping library in particular?

pocketarc commented 7 years ago

I was just thinking about PhantomJS, which uses the same browser engine and can be controlled with code. It even has documentation on dealing with all network requests (which'd allow seeing which fonts were loaded) http://phantomjs.org/network-monitoring.html

But the WebView idea seems like it'd work as well; I didn't even think about that!

jlnr commented 7 years ago

Ah, I've heard of Phantom but never used it. If you want to give it a try, I'd be happy to review and merge it. I probably won't be able to do much in the next days or weeks :( But here's my actionable To Do for this issue:

sebastianludwig commented 7 years ago

https://scrapy.org has been mentioned on HN recently when scraping was discussed. Just to throw in another alternative.

pocketarc commented 7 years ago

@jlnr The curse of open source project maintainers; never enough spare time. πŸ˜› I don't know how much time I'll have either, but if I get a few spare hours I'll work on the scraper tool using PhantomJS so it can run on a normal Linux server on a cron.

@sebastianludwig Scrapy looks good but it doesn't look like it actually loads a web page as a proper browser (running JS and whatnot), which is one of @jlnr's requirements, so we can even pick up on fonts loaded with CSS Font Loading or other JS-based techniques.