scr34m / php-malware-scanner

Scans PHP files for malwares and known threats
GNU General Public License v3.0
556 stars 96 forks source link

Run in batches? #62

Closed simplenotezy closed 3 years ago

simplenotezy commented 4 years ago

I am thinking of a way to run this script in batches, to prevent timeout if it should be run in the browser.

I am not sure if this is natively supported?

A workaround I have thought about is this:

1) Generate list of all folders in the root directory 2) Part up into chunks of folders 3) Add to "ignorePath" the folders that is not in the first chunk 4) Continue running through the chunks, until finished.

Any other ideas?

scr34m commented 4 years ago

You have to create the list and call independently on each folder in a chunk. No reason to implement something like this.

simplenotezy commented 4 years ago

Hmm. I tried implementing something like this, and it seems possible on my tests with around ~1000 files. However, just tested on a server with ~13k files and then it times out. I could imagine that it times out because my "--ignore" is too big. Basically, what I've done now is to generate a list of all PHP files in the scanned directory, and then I part that list into "batches" of say 500 each.

So it would scan on any given file in the scanned directory, but ignore all files after 500. Then on next batch, it would ignore first 500 files, take the next 500, and ignore the last of the list, and so on, untill it reaches the end.

Maybe it's not ideal, maybe it would be wiser to somehow get a list of all folders, but that too could be difficult (to avoid scanning the same folders over and over), but I still believe PHP should be able to handle an array of ~50k files it that was needed.

Without having done any performance tests, I could imagine that perhaps this line could be expensive? https://github.com/scr34m/php-malware-scanner/blob/master/scan.php#L430

simplenotezy commented 4 years ago

Perhaps a check could be done prior to using glob matching, to check if it contains a glob matching parameter such as * or ?, and if not, just do a simple compare?

scr34m commented 4 years ago

Well yes, first a simple search will be usefull, but these glob match don’t hurt so much, can you do me a benchmark?