jamesmacwhite / squidguard-adblock

Converts Adblock Plus lists into expression files that are compatible with squidGuard and ufdbGuard
MIT License
39 stars 9 forks source link

ufdbGuard is unable to optimize easylist because of too many lines #9

Open f-galland opened 2 years ago

f-galland commented 2 years ago

Hi there!

I've successfully set up ufdbguard with easylists using your script. Nice.

Once thing, though, is that ufdb complaints that it's too long a list to optimize, and that it will process urls one by one:

2022-04-03 17:58:39 [18335] ufdbCategory adblock
2022-04-03 17:58:39 [18335] ufdbCategoryExpressionList adblock/easylist
2022-04-03 17:58:39 [18335] loading regular expressions from "/usr/local/ufdbguard/blacklists/adblock/easylist"
2022-04-03 17:58:45 [18335] WARNING: the expressionlist has 25101 expressions and may use many resources  *****
2022-04-03 17:58:45 [18335]       Note that large numbers of expressions may impact performance considerably
2022-04-03 17:58:50 [18335] ERROR: UFDBoptimizeExprList: unable to optimise 25101 expressions of /usr/local/ufdbguard/blacklists/adblock/easylist  (error 12)  *****
2022-04-03 17:58:50 [18335] Since the 25101 expressions could not be optimised into one expression, they will be evaluated one by one which impacts performance  *****
2022-04-03 17:58:50 [18335] ufdbCategoryExpressionList adblock/easyprivacy
2022-04-03 17:58:50 [18335] loading regular expressions from "/usr/local/ufdbguard/blacklists/adblock/easyprivacy"
2022-04-03 17:59:06 [18335] WARNING: the expressionlist has 45652 expressions and may use many resources  *****
2022-04-03 17:59:06 [18335]       Note that large numbers of expressions may impact performance considerably
2022-04-03 17:59:11 [18335] ERROR: UFDBoptimizeExprList: unable to optimise 45652 expressions of /usr/local/ufdbguard/blacklists/adblock/easyprivacy  (error 12)  *****
2022-04-03 17:59:11 [18335] Since the 45652 expressions could not be optimised into one expression, they will be evaluated one by one which impacts performance  *****

there is some heavy RAM usage (2GB+) on the server as well.

I was wondering if splitting the output file into separate smaller files could be a simple solution for this?

Thanks a lot for the script!


EDIT:

I something like the following

cd /usr/local/ufdbguard/blacklists/adblock
split -a5 -l400 -d easyprivacy easyprivacy_
split -a5 -l400 -d easylist easylist_
echo -ne "\n\n\n\n\n\n\n"
for i in $(ls /usr/local/ufdbguard/blacklists/adblock); do echo "expressionlist adblock/$i"; done

Which splits the easylist and easyprivacy files in smaller 400-lines files (the maximum recommended by the docs is 500) and then outputs the needed expressionlists for the ufdbGuard.conf file.

Then I modified my ufdbGuard.conf like so:

category adblock {
    expressionlist adblock/easylist_00000
    expressionlist adblock/easylist_00001
    expressionlist adblock/easylist_00002
    expressionlist adblock/easylist_00003
    expressionlist adblock/easylist_00004
    expressionlist adblock/easylist_00005
    expressionlist adblock/easylist_00006
    expressionlist adblock/easylist_00007
    expressionlist adblock/easylist_00008
    expressionlist adblock/easylist_00009
    expressionlist adblock/easylist_00010
    expressionlist adblock/easylist_00011
    expressionlist adblock/easylist_00012
    expressionlist adblock/easylist_00013
    expressionlist adblock/easylist_00014
    expressionlist adblock/easylist_00015
    expressionlist adblock/easylist_00016
    expressionlist adblock/easylist_00017
    expressionlist adblock/easylist_00018
    expressionlist adblock/easylist_00019
    expressionlist adblock/easylist_00020
    expressionlist adblock/easylist_00021
    expressionlist adblock/easylist_00022
    expressionlist adblock/easylist_00023
    expressionlist adblock/easylist_00024
    expressionlist adblock/easylist_00025
    expressionlist adblock/easylist_00026
    expressionlist adblock/easylist_00027
    expressionlist adblock/easylist_00028
    expressionlist adblock/easylist_00029
    expressionlist adblock/easylist_00030
    expressionlist adblock/easylist_00031
    expressionlist adblock/easylist_00032
    expressionlist adblock/easylist_00033
    expressionlist adblock/easylist_00034
    expressionlist adblock/easylist_00035
    expressionlist adblock/easylist_00036
    expressionlist adblock/easylist_00037
    expressionlist adblock/easylist_00038
    expressionlist adblock/easylist_00039
    expressionlist adblock/easylist_00040
    expressionlist adblock/easylist_00041
    expressionlist adblock/easylist_00042
    expressionlist adblock/easylist_00043
    expressionlist adblock/easylist_00044
    expressionlist adblock/easylist_00045
    expressionlist adblock/easylist_00046
    expressionlist adblock/easylist_00047
    expressionlist adblock/easylist_00048
    expressionlist adblock/easylist_00049
    expressionlist adblock/easylist_00050
    expressionlist adblock/easylist_00051
    expressionlist adblock/easylist_00052
    expressionlist adblock/easylist_00053
    expressionlist adblock/easylist_00054
    expressionlist adblock/easylist_00055
    expressionlist adblock/easylist_00056
    expressionlist adblock/easylist_00057
    expressionlist adblock/easylist_00058
    expressionlist adblock/easylist_00059
    expressionlist adblock/easylist_00060
    expressionlist adblock/easylist_00061
    expressionlist adblock/easylist_00062
    expressionlist adblock/easyprivacy_00000
    expressionlist adblock/easyprivacy_00001
    expressionlist adblock/easyprivacy_00002
    expressionlist adblock/easyprivacy_00003
    expressionlist adblock/easyprivacy_00004
    expressionlist adblock/easyprivacy_00005
    expressionlist adblock/easyprivacy_00006
    expressionlist adblock/easyprivacy_00007
    expressionlist adblock/easyprivacy_00008
    expressionlist adblock/easyprivacy_00009
    expressionlist adblock/easyprivacy_00010
    expressionlist adblock/easyprivacy_00011
    expressionlist adblock/easyprivacy_00012
    expressionlist adblock/easyprivacy_00013
    expressionlist adblock/easyprivacy_00014
    expressionlist adblock/easyprivacy_00015
    expressionlist adblock/easyprivacy_00016
    expressionlist adblock/easyprivacy_00017
    expressionlist adblock/easyprivacy_00018
    expressionlist adblock/easyprivacy_00019
    expressionlist adblock/easyprivacy_00020
    expressionlist adblock/easyprivacy_00021
    expressionlist adblock/easyprivacy_00022
    expressionlist adblock/easyprivacy_00023
    expressionlist adblock/easyprivacy_00024
    expressionlist adblock/easyprivacy_00025
    expressionlist adblock/easyprivacy_00026
    expressionlist adblock/easyprivacy_00027
    expressionlist adblock/easyprivacy_00028
    expressionlist adblock/easyprivacy_00029
    expressionlist adblock/easyprivacy_00030
    expressionlist adblock/easyprivacy_00031
    expressionlist adblock/easyprivacy_00032
    expressionlist adblock/easyprivacy_00033
    expressionlist adblock/easyprivacy_00034
    expressionlist adblock/easyprivacy_00035
    expressionlist adblock/easyprivacy_00036
    expressionlist adblock/easyprivacy_00037
    expressionlist adblock/easyprivacy_00038
    expressionlist adblock/easyprivacy_00039
    expressionlist adblock/easyprivacy_00040
    expressionlist adblock/easyprivacy_00041
    expressionlist adblock/easyprivacy_00042
    expressionlist adblock/easyprivacy_00043
    expressionlist adblock/easyprivacy_00044
    expressionlist adblock/easyprivacy_00045
    expressionlist adblock/easyprivacy_00046
    expressionlist adblock/easyprivacy_00047
    expressionlist adblock/easyprivacy_00048
    expressionlist adblock/easyprivacy_00049
    expressionlist adblock/easyprivacy_00050
    expressionlist adblock/easyprivacy_00051
    redirect http://MY_IP:8081/cgi-bin/URLblocked.cgi?admin=%A&mode=default&color=red&size=normal&clientaddr=%a&clientname=%n&clientuser=%i&clientgroup=%s&targetgroup=%t&url=%u
}

And now I'm not getting any warning whatsoever and the memory usage dropped down to 500MB, which is still a lot, but much better than 2GB.

Curious how such a dirty solution worked.

Anyway, I hope it helps someone

Cheers

f-galland commented 2 years ago

Another suggestion:

If you are using ssl bump to transparently proxy HTTPS with squid, your category adblock block, should include the following:

option block-bumped-connect on