gorhill / httpswitchboard

Point & click to forbid/allow any class of requests made by your browser. Use it to block scripts, iframes, ads, facebook, etc.
GNU General Public License v3.0
1.33k stars 84 forks source link

 hpHosts’ HOSTS not updated #398

Open ghost opened 10 years ago

ghost commented 10 years ago

I just noticed that above list is reported as up-to-date in the About tab. However, when I click on that list in the Ubiquitous Rules tab, it says:

hpHosts last updated on: 03/02/2014 23:07

while the hpHosts site shows the last update from 16th Aug. 2014.

(I know that you have been rather sceptical about that list because of false positives - and in the meantime it's become much bigger with more than 902,000 hosts! Nevertheless, as long as it is supported by HTTPSB it should be updated, IMHO.)

gorhill commented 10 years ago

I should not have put this huge file in there in the first place, my mistake. Forcing a 14MB download on all people who do not use this one is not right. I can't do this. I can't remove it either because some people might be using it. So for now, the least worse choice is to not remove it, and not update it so that it's not downloaded.

If I port uBlock's code to update only what is in use makes it to HTTPSB, then I will update it.

ghost commented 10 years ago

I agree, you're absolutely right. (The updated file has even 33MB !!!)

Just one idea to further reduce download size significantly: Some hosts files are also available as zip archives, namely

http://winhelp2002.mvps.org/hosts.zip http://hosts-file.net/download/hosts.zip http://mirror1.malwaredomains.com/files/justdomains.zip

and maybe other ones.

Perhaps it's possible to implement an unzip functionality in µMatrix?

gorhill commented 10 years ago

Perhaps it's possible to implement an unzip functionality in µMatrix?

Well there is this js library to handle zip, but that would increase the code footprint of extensions.

I am currently working on the code to fetch only what is used, and only if auto-update is enabled, so to me that's the best approach, with smallest overhead (I expect unzipping in javascript to add noticeable overhead). So in uMatrix that code will be used, and the gigantic hosts file won't be part of stock lists, user will have to add it manually to custom lists.

ghost commented 10 years ago

Understood. Nevertheless perhaps alternatives:

https://stackoverflow.com/questions/2095697/unzipping-files

mikhaelkh commented 10 years ago

You can remove heavy subscriptions like this from your extensions and download them only when nessesary. Zip files is worse choice than plain text in GitHub as you can't see what's changed and diff size will be much larger. The best way to update is downloading only diff file, but it may be hard to implement. But you like challenges;)

ghost commented 10 years ago

I am currently working on the code to fetch only what is used, and only if auto-update is enabled,

Another idea to significantly save bandwith is checking the timestamp. This is supported by mvps.org, hosts-file.net, someonewhocares.org, pgl.yoyo.org and malwaredomains.com.

Perhaps it's possible to mimic the behavior of

wget -N

in javascript? Wouldn't this allow for abandoning fixed update intervals?

gorhill commented 10 years ago

If I want a common code base for that part of uBlock and HTTPSB, that won't work, as the EasyList lists are updated every two hours (or more I am not sure), even when there are no change in the filters.

ghost commented 10 years ago

Are you sure? Both EasyList and EasyPrivacy say:

Expires: 4 days (update frequency)

while, e.g., Fanboy's Enhanced Tracking List says:

This list expires after 2 days

and EasyList Germany:

Expires: 1 days (update frequency)

So it seems that the update intervals are not consistent. Isn't that one reason more to think about something like wget -N ?

EDIT: I'm not sure if easylist supports timestamps ... I have to check that. EDIT2: I just checked with Easylist - timestamp IS supporetd.

gorhill commented 10 years ago

It's what it says in the file, but the lists are really updated much more often, even when there are no change. I am guessing there is a script generating them every two hours or less. Probably because when there is a filter-related fix, they can tell the user to just force-update their list. Open EasyList in a web page, and repeat again in one or two hours into another web page and see the "Last modified" at the top of the file value changing.

ghost commented 10 years ago

Okay, I will download Easylist with wget -N again in 2 or 3 hours. Let's see what will happen ;-)

ghost commented 10 years ago

You're right - I got a new version ...

cbuijs commented 10 years ago

The adblock lists has the "expires" value to instructs the user (adblocker) of the list to concider the list stale after the period specified and refresh it. It is not an "updated" value/indicator. That would be the "version" specified in the list as well, that is updated every time the list is updated which can be anytime and probably around every hour or so for most easylists.