slashtechno / cloudflare-gateway-adblocking

Serverless ad blocking via Cloudflare Zero Trust gateway
https://pypi.org/project/cloudflare-gateway-adblocking/
MIT License
20 stars 5 forks source link

Omit duplicate hosts #9

Closed manhduonghn closed 1 year ago

manhduonghn commented 1 year ago

Could you add a feature to remove duplicate lines in hosts.txt before uploading lists to Cloudflare Gateway?

slashtechno commented 1 year ago

Sure, it should be a simple addition to remove duplicate hosts before uploading.

manhduonghn commented 1 year ago

Ok, waiting for that

manhduonghn commented 1 year ago

Could you support txt domains source and sort domains when uploading?

manhduonghn commented 1 year ago

When waiting for you,I use this to remove dulicate lines curl -s "$url" | awk '!seen[$0]++' >> "$outfile"

slashtechno commented 1 year ago

Could you support txt domains source and sort domains when uploading?

I can modify the regex to make the 127.0.0.1/0.0.0.0 optional.

As for sorting, I'm unsure why the order matters.

manhduonghn commented 1 year ago

Look nicer

slashtechno commented 1 year ago

Look nicer

Oh, if you're talking about the uploads not completing in order, it's because it uploads asynchronously.

manhduonghn commented 1 year ago

I make this to resolve that problem

#!/bin/bash
#source .env
urls=(
    https://raw.githubusercontent.com/bigdargon/hostsVN/master/option/hosts-VN
    https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
    https://raw.githubusercontent.com/luxysiv/hosts/main/hosts.txt
)
outfile="hosts.txt"
tempfile="temp.txt"
for url in "${urls[@]}"
do
    curl -s "$url" >> "$tempfile"
    echo >> "$tempfile"
done
grep "^0\.0\.0\.0" "$tempfile" | awk '!seen[$0]++' > "$outfile"
rm "$tempfile"
pip install cloudflare-gateway-adblocking
cloudflare-gateway-adblocking --account-id "$CF_ACCOUNT_ID" --token "$CF_TOKEN" delete
cloudflare-gateway-adblocking --account-id "$CF_ACCOUNT_ID" --token "$CF_TOKEN" upload --blocklists "$outfile"
rm "$outfile"

It can work both on Github Action and Termux

slashtechno commented 1 year ago

Thanks, but I'm planning on implementing this in Python, not Bash

manhduonghn commented 1 year ago

Thanks, but I'm planning on implementing this in Python, not Bash

But you read dulicate from csv uploaded or hosts.txt resources?

slashtechno commented 1 year ago

Fixed this in commit 7e98125597cda0a312398aa4cbd1fe12b534d1d5

manhduonghn commented 1 year ago

I will test it

manhduonghn commented 1 year ago

Timeout too short

slashtechno commented 1 year ago

Timeout too short

I think 10 seconds is fine, no? What do you propose I change it to?

manhduonghn commented 1 year ago

I downloaded your code and up my private repository because I put .env there. Thanks so much for your amazing code and I wrote download.py to download resources needed because you don't like bash

manhduonghn commented 1 year ago

Timeout too short

I think 10 seconds is fine, no? What do you propose I change it to?

It's fine. I bring it to Github Action. Because network speed in my country sometime not good. So thanks again

slashtechno commented 1 year ago

It's fine. I bring it to Github Action. Because network speed in my country sometime not good. So thanks again

I'll add a flag to change the timeout

manhduonghn commented 1 year ago

It's fine. I bring it to Github Action. Because network speed in my country sometime not good. So thanks again

I'll add a flag to change the timeout

Yeah, It's useful when run on Termux

slashtechno commented 1 year ago

It's fine. I bring it to Github Action. Because network speed in my country sometime not good. So thanks again

I'll add a flag to change the timeout

Yeah, It's useful when run on Termux

Just implemented this in v0.1.3

manhduonghn commented 1 year ago

How to use time out ? --timeout 10min ?Please

slashtechno commented 1 year ago

How to use time out ? --timeout 10min ?Please

I should have clarified in the help text, sorry. The timeout is in seconds. --timeout 600 would set it for 10 minutes.

manhduonghn commented 1 year ago

Thanks

manhduonghn commented 1 year ago

Screenshot_2023-08-21-05-17-32-386_com termux

slashtechno commented 1 year ago

Try putting --timeout before upload

manhduonghn commented 1 year ago

Thanks and can you delete my pull requests ?

slashtechno commented 1 year ago

Thanks and can you delete my pull requests ?

I just replied your pull request. If you don't mind me asking, why do you want me to delete it? Are you fine closing it instead?

manhduonghn commented 1 year ago

I'm ok. And time out works perfect Screenshot_2023-08-21-05-32-11-449_com termux Thanks again

slashtechno commented 1 year ago

I'm ok. And time out works perfect.

Should I close #10 then?

manhduonghn commented 1 year ago

I create folder whitelists and put hosts.txt in there and still use command upload ?

slashtechno commented 1 year ago

I create folder whitelists and put hosts.txt in there and still use command upload ?

If you want to allow certain domains, you can either specify a directory or file with --whitelists. If you're talking about specifying a directory of blocklists, you can use --blocklists and pass a path to a directory instead of a file if you want.