kawaiipantsu / spamassassin-rules

Custom SpamAssassin rules I and others have made and contributed with - To mitigate spam mails and phishing mails now also with cool Phishtank rules
MIT License
34 stars 13 forks source link

php 7.4 #4

Closed alsyundawy closed 1 year ago

alsyundawy commented 1 year ago

spamassassin-rules-master]# ./update-phishtank-rules [06:33:54 01-07-2023] ==[ UPDATE-PHISHTANK-RULES ]=============================== [06:33:54 01-07-2023] > Checking for phishtank DB locally [06:33:54 01-07-2023] > Did not find any local DB, downloading now PHP Warning: file_get_contents(http://data.phishtank.com/data/online-valid.json): failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests in /root/spamassassin-rules-master/update-phishtank-rules on line 184 [06:33:55 01-07-2023] - Loading [06:33:55 01-07-2023] - Done PHP Warning: count(): Parameter must be an array or an object that implements Countable in /root/spamassassin-rules-master/update-phishtank-rules on line 74 [06:33:55 01-07-2023] > Found 0 PhishTank entries in DB [06:33:55 01-07-2023] > Cleaning up phishtank validated rule directory [06:33:55 01-07-2023] > Cleaning up phishtank online rule directory [06:33:55 01-07-2023] > Beginning to build Phishtank rules PHP Warning: Invalid argument supplied for foreach() in /root/spamassassin-rules-master/update-phishtank-rules on line 86 [root@mirror spamassassin-rules-master]# php -v PHP 7.4.33 (cli) (built: May 11 2023 10:50:02) ( NTS ) Copyright (c) The PHP Group Zend Engine v3.4.0, Copyright (c) Zend Technologies with Zend OPcache v7.4.33, Copyright (c), by Zend Technologies

kawaiipantsu commented 1 year ago

Hey @alsyundawy

Yeah this is actually phishtank that has limited their incomming requests even more now, i found a workaround. Let me update the script and make sure it works again as intended.

This is not a "bug" as per so, this is Phishtank limiting access to their API sadly. HTTP/1.1 429 Too Many Requests

alsyundawy commented 1 year ago

i can download manualy using

wget -c https://cdn.phishtank.com/datadumps/verified_online.php_serialized

how about

PHP Warning: count(): Parameter must be an array or an object that implements Countable in /root/spamassassin-rules-master/update-phishtank-rules on line 74

PHP Warning: Invalid argument supplied for foreach() in /root/spamassassin-rules-master/update-phishtank-rules on line 86

kawaiipantsu commented 1 year ago

Those errors are just there as the first part is not working. ie. it can't download the file to the rest of the script fails. There is no error checking if the file downloads or not.

The reason why the script get's blocked is that i have been nice to tell them a custom user-agent. But i still feel like that is the propper way to do it. To "hide" behind fake useragent strings etc is not what i want. But that might be needed.

"User-Agent: PhishTank SpamAssassin Rule Updater (github.com/kawaiipantsu/spamassassin-rules)

gotspatel commented 1 year ago

my 2 cents

function downloadPhishTankDB() {
 // Archive check - Archive if needed!
 if ( PHISHTANK_DB_ARCHIVE && is_file(ptDB_latest) ) {
  if ( is_writable(__DIR__."/") ) rename(ptDB_latest, ptDB_archive);
  // If compression is wanted, do it!
  if ( PHISHTANK_ARCHIVE_GZ ) {
   $data = file_get_contents(ptDB_archive);
   $gzdata = gzencode($data, 9);
   file_put_contents(ptDB_archive.".gz", $gzdata);
   unlink(ptDB_archive);
  }
 }
 // Old file check - Unlink if needed!
 if ( !PHISHTANK_DB_ARCHIVE && is_file(ptDB_latest) ) {
  if ( is_writable(__DIR__."/") ) unlink(ptDB_latest);
 }
 $url = trim(PHISHTANK_DB_URL);
 $options = array(
     'http' => array(
         'method' => "GET",
         'header' => "Accept-language: en\r\n" .
                     "User-Agent: PhishTank SpamAssassin Rule Updater (github.com/kawaiipantsu/spamassassin-rules)\r\n" // Friendly UA
     )
 );
 $context = stream_context_create($options);
 $isDownloaded = false; // Declare $isDownloaded variable here

 if (is_writable(__DIR__ . "/")) {
        $content = file_get_contents($url, false, $context);

        if ($content !== false) {
            file_put_contents(ptDB_latest, $content);
            $isDownloaded = true;
            cLog(">  ");
            cLog("> JSON Downloaded by Standard Method");
            cLog(">  ");
        }
    }

 // If the download was unsuccessful, use wget -c to resume the download
 if (!$isDownloaded) {
        $wgetCommand = "wget -c $url -P " . __DIR__ . "/";
        exec($wgetCommand);

        // Verify if the file was downloaded successfully
        $downloadedFile = __DIR__ . "/" . basename($url);
        if (file_exists($downloadedFile)) {
            // Rename the downloaded file
            rename($downloadedFile, ptDB_latest);
            $isDownloaded = true;
            cLog(">  ");
            cLog("> JSON Downloaded by Wget");
            cLog(">  ");
        }
    }

}
kawaiipantsu commented 1 year ago

This is not an issue as said earlier. PhishTank will limit/throttle download of none logged in users. The script will fil if it's not able to download anything from the website as said earlier.

Possible fix to bypass throttling - But not guaranteed. https://github.com/kawaiipantsu/spamassassin-rules/commit/944302091a6ddbd117bb0184f69f833d3b18a51c

I have pushed a new version that will try to get around Phishtanks limits, but it's better to play nice than to try and bypass their limitations. Please don't download new phishtank rules to often or your IP will be blocked/limited and then throttled as Phishtank writes on their web site.

alsyundawy commented 1 year ago

my 2 cents

function downloadPhishTankDB() {
 // Archive check - Archive if needed!
 if ( PHISHTANK_DB_ARCHIVE && is_file(ptDB_latest) ) {
  if ( is_writable(__DIR__."/") ) rename(ptDB_latest, ptDB_archive);
  // If compression is wanted, do it!
  if ( PHISHTANK_ARCHIVE_GZ ) {
   $data = file_get_contents(ptDB_archive);
   $gzdata = gzencode($data, 9);
   file_put_contents(ptDB_archive.".gz", $gzdata);
   unlink(ptDB_archive);
  }
 }
 // Old file check - Unlink if needed!
 if ( !PHISHTANK_DB_ARCHIVE && is_file(ptDB_latest) ) {
  if ( is_writable(__DIR__."/") ) unlink(ptDB_latest);
 }
 $url = trim(PHISHTANK_DB_URL);
 $options = array(
     'http' => array(
         'method' => "GET",
         'header' => "Accept-language: en\r\n" .
                     "User-Agent: PhishTank SpamAssassin Rule Updater (github.com/kawaiipantsu/spamassassin-rules)\r\n" // Friendly UA
     )
 );
 $context = stream_context_create($options);
 $isDownloaded = false; // Declare $isDownloaded variable here

 if (is_writable(__DIR__ . "/")) {
        $content = file_get_contents($url, false, $context);

        if ($content !== false) {
            file_put_contents(ptDB_latest, $content);
            $isDownloaded = true;
            cLog(">  ");
            cLog("> JSON Downloaded by Standard Method");
            cLog(">  ");
        }
    }

 // If the download was unsuccessful, use wget -c to resume the download
 if (!$isDownloaded) {
        $wgetCommand = "wget -c $url -P " . __DIR__ . "/";
        exec($wgetCommand);

        // Verify if the file was downloaded successfully
        $downloadedFile = __DIR__ . "/" . basename($url);
        if (file_exists($downloadedFile)) {
            // Rename the downloaded file
            rename($downloadedFile, ptDB_latest);
            $isDownloaded = true;
            cLog(">  ");
            cLog("> JSON Downloaded by Wget");
            cLog(">  ");
        }
    }

}

this replace existing script code ?

kawaiipantsu commented 1 year ago

Please note if you try to use the code from Gotspatel you might end up getting your IP blocked even faster. Since the original error (problem) is the following: HTTP/1.1 429 Too Many Requests

This is Phishtank saying to you "Stop trying to download the file, you have tried to many times from your IP address" What Gotspatel code will do is try imidialy try to re-download it now from wget command line, meaning Phishtank will see you try to re-download it even though they told you to stop from that IP.

i have updated my code to reflect their API requirements, this should help you a bit. But don't download that database more than once a day. It makes no sense to do so either.

alsyundawy commented 1 year ago

Please note if you try to use the code from Gotspatel you might end up getting your IP blocked even faster. Since the original error (problem) is the following: HTTP/1.1 429 Too Many Requests

This is Phishtank saying to you "Stop trying to download the file, you have tried to many times from your IP address" What Gotspatel code will do is try imidialy try to re-download it now from wget command line, meaning Phishtank will see you try to re-download it even though they told you to stop from that IP.

i have updated my code to reflect their API requirements, this should help you a bit. But don't download that database more than once a day. It makes no sense to do so either.

Stop trying to download the file?

wget -c http://data.phishtank.com/data/online-valid.json --2023-07-04 17:59:24-- http://data.phishtank.com/data/online-valid.json Resolving data.phishtank.com (data.phishtank.com)... 104.16.101.75, 104.17.177.85, 2606:4700::6811:b155, ... Connecting to data.phishtank.com (data.phishtank.com)|104.16.101.75|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: https://data.phishtank.com/data/online-valid.json [following] --2023-07-04 17:59:24-- https://data.phishtank.com/data/online-valid.json Connecting to data.phishtank.com (data.phishtank.com)|104.16.101.75|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://cdn.phishtank.com/datadumps/verified_online.json?Expires=1688468374&Signature=DWNHauPt2x4bm0nHXmkGZAHSs6hpyFo0pN6dNb7mf2rRCVq3TJNMM~D-VcTRncHUYyqvLdW6SuKPVdtniXOV iyFHM-YbATYFvNGVkt1aDYk14HnIV5KX58jBtVQPg~TMk8KSiOyIlkt-sZ1G2b9CNulfnWX44fZjifsemGURAZeQrkOFmYRUOOGRhrbbyp4c9~DEiDSLVNvyFEyEt0E-mQ47-bSt9Lgx8ANHQ1NQT74XcmqUGZK~htPJAEgPv5d6Lxtblmyt ~BkP75mLPajMciMhK0PhwZIL-VdmMo4n1xT0w08gngT4EbtMMseeM0jZtbIzXXjXe5ufOTUTwZlvtA__&Key-Pair-Id=APKAILB45UG3RB4CSOJA [following] --2023-07-04 17:59:24-- https://cdn.phishtank.com/datadumps/verified_online.json?Expires=1688468374&Signature=DWNHauPt2x4bm0nHXmkGZAHSs6hpyFo0pN6dNb7mf2rRCVq3TJNMM~D-VcTRncHUYyqvL dW6SuKPVdtniXOViyFHM-YbATYFvNGVkt1aDYk14HnIV5KX58jBtVQPg~TMk8KSiOyIlkt-sZ1G2b9CNulfnWX44fZjifsemGURAZeQrkOFmYRUOOGRhrbbyp4c9~DEiDSLVNvyFEyEt0E-mQ47-bSt9Lgx8ANHQ1NQT74XcmqUGZK~htPJA EgPv5d6Lxtblmyt~BkP75mLPajMciMhK0PhwZIL-VdmMo4n1xT0w08gngT4EbtMMseeM0jZtbIzXXjXe5ufOTUTwZlvtA__&Key-Pair-Id=APKAILB45UG3RB4CSOJA Resolving cdn.phishtank.com (cdn.phishtank.com)... 104.17.177.85, 104.16.101.75, 2606:4700::6810:654b, ... Connecting to cdn.phishtank.com (cdn.phishtank.com)|104.17.177.85|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 52713088 (50M) [application/json] Saving to: 'online-valid.json'

online-valid.json 100%[==============================================================================================>] 50.27M 42.4MB/s in 1.2s

2023-07-04 17:59:26 (42.4 MB/s) - 'online-valid.json' saved [52713088/52713088]

Success download with wget in same server & same ip public ipv4, ok i will try again here you go

wget -c http://data.phishtank.com/data/online-valid.json URL transformed to HTTPS due to an HSTS policy --2023-07-04 18:01:51-- https://data.phishtank.com/data/online-valid.json Resolving data.phishtank.com (data.phishtank.com)... 104.17.177.85, 104.16.101.75, 2606:4700::6810:654b, ... Connecting to data.phishtank.com (data.phishtank.com)|104.17.177.85|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://cdn.phishtank.com/datadumps/verified_online.json?Expires=1688468521&Signature=ZXr-Ouv0fqpe~ECG5pX8Qk7UxSX0la5nd7a1tcC5qVGYmeApbnYUNRAfvJz4hfkX5kkh-vgwMtltiiE4kyCxr21hN2-5jql48SQ1aHAiaJp0sOBMuW-2XadE8iHj6vihB3NeKBtvbiSoDWSuPj17mrPjmy4wJo1w-VglU2EsK42TcLsgXBxP4PdoLLbsu9l1RsQ9s9F1k83espST~o-m0HIjsiZ-c-oG5SZFen3n22kboWqxM-TjO6oJLaTNwdNv3lyGfDag0BwZWZ-79jq-3X9jCLKwpUQGhimkPoaITaaiOv6rPwpgm1EksmVsbgQzWJDvJHsHcx4odIz-G8YK2A__&Key-Pair-Id=APKAILB45UG3RB4CSOJA [following] --2023-07-04 18:01:51-- https://cdn.phishtank.com/datadumps/verified_online.json?Expires=1688468521&Signature=ZXr-Ouv0fqpe~ECG5pX8Qk7UxSX0la5nd7a1tcC5qVGYmeApbnYUNRAfvJz4hfkX5kkh-vgwMtltiiE4kyCxr21hN2-5jql48SQ1aHAiaJp0sOBMuW-2XadE8iHj6vihB3NeKBtvbiSoDWSuPj17mrPjmy4wJo1w-VglU2EsK42TcLsgXBxP4PdoLLbsu9l1RsQ9s9F1k83espST~o-m0HIjsiZ-c-oG5SZFen3n22kboWqxM-TjO6oJLaTNwdNv3lyGfDag0BwZWZ-79jq-3X9jCLKwpUQGhimkPoaITaaiOv6rPwpgm1EksmVsbgQzWJDvJHsHcx4odIz-G8YK2A__&Key-Pair-Id=APKAILB45UG3RB4CSOJA Resolving cdn.phishtank.com (cdn.phishtank.com)... 104.16.101.75, 104.17.177.85, 2606:4700::6811:b155, ... Connecting to cdn.phishtank.com (cdn.phishtank.com)|104.16.101.75|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 52713088 (50M) [application/json] Saving to: 'online-valid.json'

online-valid.json 100%[====================================================================================================>] 50.27M 30.1MB/s in 1.7s

2023-07-04 18:01:54 (30.1 MB/s) - 'online-valid.json' saved [52713088/52713088]

kawaiipantsu commented 1 year ago

Please refer to their web-site. https://phishtank.org/developer_info.php

We require that you use a descriptive User Agent string in your application to identify the application. If your User Agent is blank or generic, you may recieve an increased number of rate limited requests or be redirected to additional security checks.

If you do intend to fetch these files automatically, please register for an application key and see below for instructions on how to use it to request files. Without this key, you will be limited to a few downloads per day.

alsyundawy commented 1 year ago

ok will make mirror there files to my own server like domain.tld/online-valid.json , can then focus like my question before

**1. PHP Warning: count(): Parameter must be an array or an object that implements Countable

  1. PHP Warning: Invalid argument supplied for foreach()**

php -v PHP 7.4.3-4ubuntu2.19 (cli) (built: Jun 27 2023 15:49:59) ( NTS ) Copyright (c) The PHP Group Zend Engine v3.4.0, Copyright (c) Zend Technologies with Zend OPcache v7.4.3-4ubuntu2.19, Copyright (c), by Zend Technologies

kawaiipantsu commented 1 year ago

The script produces no errors if it can download the file and it's valid json. I might in the future implement error handling when limit it reached but for now i don't see a reason.

kawaiipantsu commented 1 year ago

But you can simply just download the file yourself and put it in the same path as the update script and call the file phishtank-latest-db.json then the script will just use that file instead of downloading it.

So a simple bash file like this would do the trick for you.

#!/bin/bash
wget -c -O phishtank-latest-db.json http://data.phishtank.com/data/online-valid.json
./update-phishtank-rules
kawaiipantsu commented 1 year ago

I have pushed a simple example bash script to pull db with wget before running the update script. https://github.com/kawaiipantsu/spamassassin-rules/blob/master/pre-download-phishtank-db

alsyundawy commented 1 year ago

But you can simply just download the file yourself and put it in the same path as the update script and call the file phishtank-latest-db.json then the script will just use that file instead of downloading it.

So a simple bash file like this would do the trick for you.

#!/bin/bash
wget -c -O phishtank-latest-db.json http://data.phishtank.com/data/online-valid.json
./update-phishtank-rules

update-phishtank-rules still download files too after wget finisihed downloading?

kawaiipantsu commented 1 year ago

No,

The update-phishtank-rules will use the local copy if it's there and called phishtank-latest-db.json and it's not older than 5 days and it's readable byt the user running the script.

https://github.com/kawaiipantsu/spamassassin-rules/blob/1a506a9eed246cdcd9b42ae362cbb9f891dbccc4/update-phishtank-rules#L50

And it will say so in the output when running the script.

[13:44:03 04-07-2023] ==[ UPDATE-PHISHTANK-RULES ]===============================
[13:44:03 04-07-2023] > Checking for phishtank DB locally
[13:44:03 04-07-2023] > Found DB locallay   <--------- THIS LINE
[13:44:03 04-07-2023]   - Loading
[13:44:04 04-07-2023]   - Done
[13:44:04 04-07-2023] > Found 80483 PhishTank entries in DB
alsyundawy commented 1 year ago

found problem downloading files. if you enable ipv6 on you env you got HTTP/1.1 429 Too Many Requests, but ipv4 only files downloading without error