owncloud / files_antivirus

:space_invader: virus scanner for ownCloud
GNU Affero General Public License v3.0
48 stars 30 forks source link

fix: fortinet scanner file name + major code cleanup + proper enginee… #512

Closed DeepDiver1975 closed 1 year ago

DeepDiver1975 commented 1 year ago

…ring

ownclouders commented 1 year ago

:boom: Acceptance tests pipeline apiAntivirus-master-mariadb10.2-php7.4 failed. The build has been cancelled.

https://drone.owncloud.com/owncloud/files_antivirus/2727/12

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 67 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

jvillafanez commented 1 year ago

Some doubts about the IScanner interface:

An interface such as the following should be enough with the proper documentation about what each method is expected to do.:

interface IScanner {
  public function initScanner(): void;
  public function scan(IScannable $item): Status;
  public function shutdownScanner(): void;
}

An additional clarification to do is that it isn't clear (and it should be documented) whether after the shutdownScanner a call to the initScanner method has to reinitialize the scanner and make it ready to scan a file, or once the scanner is shutdown you have to create a new one if you want to scan a file.

I guess the filename parameter in the initScanner method is for the onAsyncData method, but you have to trust that you're actually scanning that file.

For the scanner implementations, I still need to give some thoughts about the architecture. The ICAPScanner should probably be an abstract class. It seems clearer to have a GenericICAPScanner implementation rather than having the ICAPScanner class to act as generic implementation. By having abstract methods it's clear what the implementations need to implement.

DeepDiver1975 commented 1 year ago

THX for your feedback - all quite valid points. Question is how much more refactor we want to have in a single PR. It is already damn big ....

jvillafanez commented 1 year ago

For the scanner architecture, I'd aim for something like the following diagram: Screenshot from 2022-11-17 10-55-44

Outside of the ICAP scanner, I'm not sure if we're officially supporting any scanner other than clamav, otherwise we can adjust the ClamavScanner class name to another one.


For the ClamavScanner there are a couple of options:

abstract class ClamavScanner implements IScanner {
  abstract public function initScanner();

  abstract protected function prepareToScan(IScannable $item);

  final public function scan(IScannable $item) {
    $this->prepareToScan($item);
    while (($chunk = $item->fread()) !== false) {
      $this->sendData($chunk);
    }
    return $this->finishScan($item);
  }

  abstract protected function sendData($data);

  abstract protected function finishScan(IScannable $item);

  abstract public function shutdownScanner();
}

or

abstract class ClamavScanner implements IScanner {
  private $scannerConfig;

  public function initScanner() {
    $this->scannerConfig = ConfigHelper::getConfig();
  }

  abstract protected function prepareScan(IScannable $item, $scannerConfig);

  final public function scan(IScannable $item) {
    $this->prepareScan($item, $this->scannerConfig);
    $stop = false;
    while (($chunk = $item->fread()) !== false || $stop) {
      $stop = $this->sendData($chunk, $this->scannerConfig);
    }
    return $this->finishScan($item, $this->scannerConfig);
  }

  abstract protected function sendData($data, $scannerConfig);

  abstract protected function finishScan(IScannable $item, $scannerConfig);

  public function shutdownScanner() {
  }
}

The main point is that the scan function will be final and it won't be modified. The abstract methods are expected to be implemented by the subclasses in different ways. For example, the prepareScan method might open a pipe to a file in case of the LocalScanner, but the DaemonScanner might open a socket instead. Buffering can also be implemented by the subclasses, if needed, in the sendData method in order to send a bigger chunk of data. The method signature can also be adjusted according to necessities. The abstract class can provide some configuration to each implementation, likely from the same place, instead of letting each implementation fetch the configuration from different places.


For the ICAPScanner, the idea is the same:

abstract class ICAPScanner implements IScanner {
  public function initScanner() {
  }

  abstract protected function prepareRequest(IScannable $item): ICAPRequestData;

  final public function scan(IScannable $item) {
    $requestData = $this->prepareRequest($item);

    $client = new ICAPClient($requestData->getHost(), $requestData->getPort());
    if ($requestData->getType() === 'reqmod') {
      $response = $client->reqmod($requestData->getService(), $requestData->getBody(), $requestData->getHeaders());
    } else {
      $response = $client->respmod($requestData->getService(), $requestData->getBody(), $requestData->getHeaders());
    }
    return $this->processResponse($response);
  }

  abstract protected function processResponse(array $data): Status;

  public function shutdownScanner() {
  }
}

In this case, the implementations are expected to provide whatever the ICAPClient needs to perform the request in the prepareRequest method and return it as an ICAPRequestData. Once the ICAPClient sends the request, the implementations have a chance to process the response.

There might be some adjustments to be made (ICAPClient might need to handle a file stream somehow) but it should fit the current code.

Responsibilities are quite clear in this case: the ICAPScanner will send the request, while the implementations will just prepare the data and process the result.

Also note that, if we need to provide a generic ICAP scanner, it should have its own implementation at the same level of McAfeeWeb and Fortinet. Trying to implement it in the ICAPScanner will be confusing because the responsibilities won't be clear any longer.


I think this should cover everything we have at the moment and it should allow some improvements without major code changes. So the question is how we can reach this proposed goal.

I'd propose the following steps:

  1. Get rid of the asyncScan and focus only on the scan method. We'd need to touch all scanners for this, but I don't think there is any other option.
  2. Refactor the ICAP scanners as proposed. I hope this is just moving code around with minor touches.
  3. Refactoring the rest of the scanners might be left for another PR as long as they fit the proposed solution and work.

In any case, I think it's critical to have code documentation explaining what are the expectations and who is responsible of what.

DeepDiver1975 commented 1 year ago

Only one comment: The ICAPScanner is not abstract as this is a valid use case. Fortinet and McAfee are only specializations.

jvillafanez commented 1 year ago

Only one comment: The ICAPScanner is not abstract as this is a valid use case. Fortinet and McAfee are only specializations.

I think it should be. We can provide a GenericICAPScanner for that use case. It should be just splitting and moving the code that is already there.

By having those abstract methods is easier to know that the subclasses must implement those methods. If the methods are implemented in the ICAPScanner, it won't be clear whether they should be overwritten in the subclasses or they can be used, even if the methods are protected. The responsibilities are also clearer: subclasses are expected to prepare the data for the request and to process the response, while the ICAPScanner will just send the prepared request. If the ICAPScanner provides a default implementation it seems it's overstepping its responsibilities.

DeepDiver1975 commented 1 year ago

The onAsyncData() and onScanComplete() methods are driven by the stream wrapper design.

A simple scan() will not work in this case .... at least I don't see an easy way around it ...

refs: https://github.com/owncloud/files_antivirus/blob/d9d09414e0822889303a15955a20f44afcc2b55f/lib/AvirWrapper.php#L121-L132

DeepDiver1975 commented 1 year ago

Regarding initScanner():

ScannerFactory::getScanner() - which is better named buildScanner - can take the file as argument into the factory. But this will not help that much.

To make it obvious that one scanner instance is to be used for scanning one file a better name would be IScanTask or something ....

TBH: I don't want to rewrite it all :see_no_evil:

DeepDiver1975 commented 1 year ago

The onAsyncData() and onScanComplete() methods are driven by the stream wrapper design.

scanning right at this point is necessary to stop upload as early as possible - at least this was the idea .... reviewing the code revealed that the file is only scanned at the end of the upload - even if only partially...

DeepDiver1975 commented 1 year ago

@jvillafanez I suggest to merge this as is - it fulfills the need of fixing fortinet scanner integration.

I'll move your suggestions to a new ticket for further discussion and followup task for the future.

DeepDiver1975 commented 1 year ago

damn - forgot to squash .... :see_no_evil:

jnweiger commented 1 year ago

Confirmed fixed in 1.2.0-rc.1