lightswitch05 / hosts

Hostfile blocklist for ads and tracking, updated regularly
https://www.github.developerdan.com/hosts/
Apache License 2.0
1.51k stars 75 forks source link

How are you automatically updating the repo? #283

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hello, I am curious on how your auto updating the repo.

I know i can setup auto update using a VPS.

lightswitch05 commented 3 years ago

Closed source software I wrote

ghost commented 3 years ago

Closed source software I wrote

Does it need a VPS?

I can write a software that will update it from a VPS, but i want to deploy a cloud function that will do that.

lightswitch05 commented 3 years ago

It does not use any 'cloud' services

ghost commented 3 years ago

It does not use any 'cloud' services

I can write a script that will auto update, that's not the issue, the issue is that i don't want my massive computer on 24/7.

Want to work on a open source cloud function that will do this?

lightswitch05 commented 3 years ago

I either do not understand what you are asking- or you do not understand what a hosts file is.

You are welcome to use the hosts file in this project- it's a text file- it is not an executable file and will not automatically update.

ghost commented 3 years ago

Hello there, I understand what you're talking about, and I've done something similar myself. https://github.com/complexorganizations/unbound-manager/blob/main/main.go It will download all of the hostfiles from github, validate them, and then provide that single file to the users instead of 30 distinct host files. The problem I'm experiencing is ensuring that it is up to current.

So, I was curious as to how you updated every 24 hours.

lightswitch05 commented 3 years ago

Ok great thank you for explaining in more detail what you are looking for.

These tools run through the entire list- and then loop back through again to discover new domains. It runs 24/7, and is way more complicated then just downloading other people's hosts files and then repackaging it as my own work.

ghost commented 3 years ago

Ok great thank you for explaining in more detail what you are looking for.

  • My hosts files does use other hosts file- I'm not downloading and recombining files
  • I add domains to my hosts file that I find on my own- although I do also add some through requests- but that is not a standard procedure for me
  • I 'auto-expand' subdomains of blocked domains using custom software I wrote- which runs continuously. This is the 'auto updates'- it does not source from other hosts projects

    • I query VirusTotal APIs to find subdomains of blocked domains and add them to the list
    • I query CommonCrawl API to find subdomains
    • I query certificate transparency logs to find subdomains
    • I continually resolve DNS queries for each item on the list to ensure its resolvable and removed dead entries. I keep resolving dead entires and add them back if they start resolving again

These tools run through the entire list- and then loop back through again to discover new domains. It runs 24/7, and is way more complicated then just downloading other people's hosts files and then repackaging it as my own work.

Most of the time, I agree with you, and I sincerely want to do this, but the problem is that most people who have a list of names will not bother validating them, resulting in expired domains, Also, a person will need to utilize approximately 5 lists of domains, and I've realized that one of the lists will have the identical domains as the others, so I'll have to eliminate those duplicates.

dnmTX commented 3 years ago

@lightswitch05 how are your Java skills(need help with some UserScript(s) 🤔 )?

lightswitch05 commented 3 years ago

@Prajwal-Koirala you might be interested in the PiHole project, which makes it easy to have lots of lists and doesn't require any extra management.

@dnmTX I am an experienced JavaScript and Java software engineer (I think you might actually be talking about JavaScript?), but my free time is basically nonexistent.

dnmTX commented 3 years ago

@dnmTX I am an experienced JavaScript and Java software engineer (I think you might actually be talking about JavaScript?), but my free time is basically nonexistent.

😢 not even 10 min? Eh well.......

ghost commented 3 years ago

@Prajwal-Koirala you might be interested in the PiHole project, which makes it easy to have lots of lists and doesn't require any extra management.

@dnmTX I am an experienced JavaScript and Java software engineer (I think you might actually be talking about JavaScript?), but my free time is basically nonexistent.

I have used pihole but its not for me.

lightswitch05 commented 3 years ago

@dnmTX send me an email if you have a simple 10 minute coding question

ghost commented 3 years ago

@lightswitch05

I got a question, earlier you said you find all the subdomains for a domain and than block them too.

How useful is this in practice?

lightswitch05 commented 3 years ago

I think it's useful- but it creates larger files.

ghost commented 3 years ago

The file I have is roughly 500k domains I am not concerned with the size, most people implementing anything like this would be on a VPS or a server, bandwidth will not really be an issue.

ghost commented 3 years ago

Hey, I checked most of your lists and discovered a number of subdomains that the firms no longer use; what is your policy on this?

Small sample

2021/06/08 01:51:04 Invalidity: 1294940508.us.mixmarket.biz
2021/06/08 01:51:05 Invalidity: 1294940526.us.mixmarket.biz
2021/06/08 01:51:05 Invalidity: 1294940693.us.mixmarket.biz
2021/06/08 01:51:05 Invalidity: 1294940911.us.mixmarket.biz
2021/06/08 01:51:05 Invalidity: 1294940950.us.mixmarket.biz
2021/06/08 01:51:05 Invalidity: 1294940975.us.mixmarket.biz
2021/06/08 01:51:05 Invalidity: 1294941714.us.mixmarket.biz
2021/06/08 01:51:05 Invalidity: 1294941784.us.mixmarket.biz
2021/06/08 01:51:05 Invalidity: 1294941814.us.mixmarket.biz
2021/06/08 01:51:06 Invalidity: 1294942188.us.mixmarket.biz
2021/06/08 01:51:06 Invalidity: 1294942206.us.mixmarket.biz
2021/06/08 01:51:06 Invalidity: 1294942349.us.mixmarket.biz
2021/06/08 01:51:06 Invalidity: 1294942506.us.mixmarket.biz
2021/06/08 01:51:06 Invalidity: 1294943454.us.mixmarket.biz
2021/06/08 01:51:06 Invalidity: 1294943461.us.mixmarket.biz
2021/06/08 01:51:06 Invalidity: 1294943568.us.mixmarket.biz
2021/06/08 01:51:07 Invalidity: 1294944074.us.mixmarket.biz
2021/06/08 01:51:07 Invalidity: 1294944207.us.mixmarket.biz
2021/06/08 01:51:07 Invalidity: 1294944689.us.mixmarket.biz
2021/06/08 01:51:07 Invalidity: 1294944705.us.mixmarket.biz
2021/06/08 01:51:07 Invalidity: 1294945149.us.mixmarket.biz
2021/06/08 01:51:07 Invalidity: 1294945186.us.mixmarket.biz
2021/06/08 01:51:07 Invalidity: 1294945303.us.mixmarket.biz
2021/06/08 01:51:08 Invalidity: 1294945378.us.mixmarket.biz
2021/06/08 01:51:08 Invalidity: 1294945777.us.mixmarket.biz
2021/06/08 01:51:08 Invalidity: 1294945976.us.mixmarket.biz
2021/06/08 01:51:08 Invalidity: 1294945993.us.mixmarket.biz
2021/06/08 01:51:08 Invalidity: 1294946172.us.mixmarket.biz
2021/06/08 01:51:08 Invalidity: 1294946339.us.mixmarket.biz
2021/06/08 01:51:08 Invalidity: 1294946541.us.mixmarket.biz
2021/06/08 01:51:09 Invalidity: 1294946950.us.mixmarket.biz
2021/06/08 01:51:09 Invalidity: 1294947135.us.mixmarket.biz
2021/06/08 01:51:09 Invalidity: 1294947238.us.mixmarket.biz
2021/06/08 01:51:09 Invalidity: 1294947254.us.mixmarket.biz
2021/06/08 01:51:09 Invalidity: 1294947323.us.mixmarket.biz
2021/06/08 01:51:09 Invalidity: 1294947347.us.mixmarket.biz
2021/06/08 01:51:10 Invalidity: 1294947440.us.mixmarket.biz
2021/06/08 01:51:10 Invalidity: 1294947498.us.mixmarket.biz
2021/06/08 01:51:10 Invalidity: 1294947847.us.mixmarket.biz
2021/06/08 01:51:10 Invalidity: 1294947939.us.mixmarket.biz
2021/06/08 01:51:10 Invalidity: 1294948031.us.mixmarket.biz
2021/06/08 01:51:10 Invalidity: 1294948140.us.mixmarket.biz
2021/06/08 01:51:10 Invalidity: 1294948341.us.mixmarket.biz
2021/06/08 01:51:10 Invalidity: 1294948631.us.mixmarket.biz
2021/06/08 01:51:11 Invalidity: 1294948869.us.mixmarket.biz
2021/06/08 01:51:11 Invalidity: 1294949382.us.mixmarket.biz
2021/06/08 01:51:11 Invalidity: 1294949409.us.mixmarket.biz
2021/06/08 01:51:11 Invalidity: 1294949465.us.mixmarket.biz
2021/06/08 01:51:11 Invalidity: 1294950129.us.mixmarket.biz
2021/06/08 01:51:11 Invalidity: 1294950134.us.mixmarket.biz
2021/06/08 01:51:11 Invalidity: 1294951358.us.mixmarket.biz
2021/06/08 01:51:12 Invalidity: 1294952248.us.mixmarket.biz
2021/06/08 01:51:12 Invalidity: 1294952597.us.mixmarket.biz
2021/06/08 01:51:12 Invalidity: 1294953322.us.mixmarket.biz
2021/06/08 01:51:12 Invalidity: 1294953726.us.mixmarket.biz
2021/06/08 01:51:12 Invalidity: 1294953866.us.mixmarket.biz
2021/06/08 01:51:12 Invalidity: 1294954016.us.mixmarket.biz
2021/06/08 01:51:12 Invalidity: 1294954615.us.mixmarket.biz
2021/06/08 01:51:13 Invalidity: 1294954646.us.mixmarket.biz
2021/06/08 01:51:13 Invalidity: 1294955110.us.mixmarket.biz
2021/06/08 01:51:13 Invalidity: 1294955473.us.mixmarket.biz
2021/06/08 01:51:13 Invalidity: 1294956155.us.mixmarket.biz
2021/06/08 01:51:13 Invalidity: 1294956431.us.mixmarket.biz
2021/06/08 01:51:13 Invalidity: 1294957279.us.mixmarket.biz
2021/06/08 01:51:13 Invalidity: 1294957823.us.mixmarket.biz
2021/06/08 01:51:14 Invalidity: 1294957988.us.mixmarket.biz
2021/06/08 01:51:14 Invalidity: 1294958433.us.mixmarket.biz
2021/06/08 01:51:14 Invalidity: 1294961688.us.mixmarket.biz
2021/06/08 01:51:14 Invalidity: 1294961924.us.mixmarket.biz
2021/06/08 01:51:14 Invalidity: 1294962221.us.mixmarket.biz
2021/06/08 01:51:14 Invalidity: 1294963265.us.mixmarket.biz
2021/06/08 01:51:14 Invalidity: 1294964411.us.mixmarket.biz
2021/06/08 01:51:15 Invalidity: 1294965388.us.mixmarket.biz
2021/06/08 01:51:15 Invalidity: 1294967393.us.mixmarket.biz
2021/06/08 01:51:15 Invalidity: 1294967862.us.mixmarket.biz
2021/06/08 01:51:15 Invalidity: 1294969570.us.mixmarket.biz
2021/06/08 01:51:15 Invalidity: 1294969739.us.mixmarket.biz
2021/06/08 01:51:15 Invalidity: 1294972644.us.mixmarket.biz
2021/06/08 01:51:15 Invalidity: 1294973169.us.mixmarket.biz
2021/06/08 01:51:16 Invalidity: 1294975842.us.mixmarket.biz
2021/06/08 01:51:16 Invalidity: 1294980662.us.mixmarket.biz
2021/06/08 01:51:16 Invalidity: 1294981526.us.mixmarket.biz
2021/06/08 01:51:16 Invalidity: 1294983459.us.mixmarket.biz
2021/06/08 01:51:16 Invalidity: 1294983653.us.mixmarket.biz
2021/06/08 01:51:16 Invalidity: 1294984707.us.mixmarket.biz
2021/06/08 01:51:16 Invalidity: 1294984903.us.mixmarket.biz
lightswitch05 commented 3 years ago

They will get removed automatically in another week or so. I'm careful not to remove items too fast from the list in case they come back.

ghost commented 3 years ago

@lightswitch05 Can u please serve the list using GitHub user content instead of your own subdomain

lightswitch05 commented 3 years ago

www.github.developerdan.com is hosted by github. You can see that the domain is just a CNAME to lightswitch05.github.io - which is Github's hosting domain.

image

ghost commented 3 years ago

Yes, I understand and was aware of this, but when a user reads www.github.developerdan.com, it is not as trustworthy as githubusercontent.com,I understand it's the same thing, but when a person sees that, they usually think to themselves, "WTF is this?"

Also, thank you for your lists; I'm compiling all of the popular lists and now have over 5 million domains, but yours is the only one with no errors.

dnmTX commented 3 years ago

but when a user reads www.github.developerdan.com, it is not as trustworthy as githubusercontent.com,I understand it's the same thing, but when a person sees that, they usually think to themselves, "WTF is this?"

You trippin man,go see a doctor or something. There is nothing wrong with the hosted domain. If for example i have a choice to download the same list out of: osint.digitalside.it/Threat-Intel/lists/latestdomains.txt urlhaus.abuse.ch/downloads/hostfile/ github.developerdan.com/hosts/lists/ads-and-tracking-extended.txt which one do you think i'll choose???

ghost commented 3 years ago

It would be so easy to social engineer you.

Consider this scenario: an attacker clones this repo and modifies a few settings, and the users use their repo, if a user goes to bank.com and we redirect them to hackers.com, all users who think they're going to bank.com are screwed.

It matters a lot where and how you obtain your lists.

We're talking about DNS here, so if there's a redirection attack, most users won't even notice.

ghost commented 3 years ago

@lightswitch05

Please consider these attack vectors.

dnmTX commented 3 years ago

PLEASE....GO AWAY !!!!

ghost commented 3 years ago

@dnmTX

You are not a contributor or a maintainer, SMD.

dnmTX commented 3 years ago

PLEASE....GO AWAY !!!!

And NOW i INSIST !!!!!!!!!!!!!!!!!!!!!!!

lightswitch05 commented 3 years ago

Alright @dnmTX and @Prajwal-Koirala - I think its time for you to both take a deep breath and review my code of conduct: https://vimeo.com/338708688 - this might seem silly but I'm serious about it.

@Prajwal-Koirala I have to disagree with things being on Github being any more trust worthy then anything else. For example, I would not recommend downloading anything from this project https://github.com/jstrosch/malware-samples/tree/master/maldocs/banload/2020/March

lightswitch05 commented 3 years ago

To elaborate a bit more on some of these points:

It would be so easy to social engineer you.

I don't accepts PRs into this project. People can submit domains for me to add - but it runs through my management tools and the final output will always have 0.0.0.0 as the IP address. That isn't something I could change or accidentally mess up.

Consider this scenario: an attacker clones this repo and modifies a few settings, and the users use their repo

This actually sounds like a good reason for me to use my own domain vs. github's domain. My domain is my domain. githubusercontent.com could be anyone.

lightswitch05 commented 3 years ago

Yes, and someone already owns gíthubusercontent.com. What is your point? look-alike and typo-based domain names isn't something that you or I can solve.

lightswitch05 commented 3 years ago

Well, for one, I can add both of those fake domains to my list and so if someone typo's it, then it won't work. I have some other typo-based domains on my list as well. Obviously that won't help anyone if they are not using my list yet. Some security issues are easy to fix - some less so. Typo and look-alike domains have been and issue for a long time and it just doesn't have a simple solution.

I appreciate your concern for security, I too am interested in security. I just don't see how not using my own domain really helps in that regard. Also, I like my domain and I do this for free, so I'm going to put it on my domain if I want to 😆.

lightswitch05 commented 3 years ago

Oh also, if you really want to, you can use the 'raw' URL to my list. Be warned that I've restructured this project more then once and broke these raw links for anyone using them. I strongly recommend not using them for that reason, but its still an option:

ghost commented 3 years ago

Oh also, if you really want to, you can use the 'raw' URL to my list. Be warned that I've restructured this project more then once and broke these raw links for anyone using them. I strongly recommend not using them for that reason, but its still an option:

Already using this and already been broken once, if it's a 404 it will let me know.

Thank you, for everything.