Closed DeepDiver1975 closed 1 year ago
:boom: Acceptance tests pipeline apiAntivirus-master-mariadb10.2-php7.4 failed. The build has been cancelled.
Kudos, SonarCloud Quality Gate passed!
Some doubts about the IScanner
interface:
onAsyncData
? It seems that an external source read the data and feed it to the scanner (based on the AvirWrapper
. This might be dangerous because you have to trust the external source to behave, meaning it should read the whole file, not just a part of it, or pieces of multiple files. The scan
method seems safer because the scanner has full control over the source, so there is little incentive one using the onAsyncData
method.initScanner(string $fileName): void;
seems to indicate that you have to initialize the scanner per each file you want to scan. While I understand you might need to perform some initialization in the scanner, I expect that such process is performed only once. In particular, I shouldn't need to reinitialize the scanner if I want to scan 3 different files.An interface such as the following should be enough with the proper documentation about what each method is expected to do.:
interface IScanner {
public function initScanner(): void;
public function scan(IScannable $item): Status;
public function shutdownScanner(): void;
}
An additional clarification to do is that it isn't clear (and it should be documented) whether after the shutdownScanner
a call to the initScanner
method has to reinitialize the scanner and make it ready to scan a file, or once the scanner is shutdown you have to create a new one if you want to scan a file.
I guess the filename parameter in the initScanner
method is for the onAsyncData
method, but you have to trust that you're actually scanning that file.
For the scanner implementations, I still need to give some thoughts about the architecture. The ICAPScanner
should probably be an abstract class. It seems clearer to have a GenericICAPScanner
implementation rather than having the ICAPScanner
class to act as generic implementation. By having abstract methods it's clear what the implementations need to implement.
THX for your feedback - all quite valid points. Question is how much more refactor we want to have in a single PR. It is already damn big ....
For the scanner architecture, I'd aim for something like the following diagram:
Outside of the ICAP scanner, I'm not sure if we're officially supporting any scanner other than clamav, otherwise we can adjust the ClamavScanner
class name to another one.
For the ClamavScanner
there are a couple of options:
abstract class ClamavScanner implements IScanner {
abstract public function initScanner();
abstract protected function prepareToScan(IScannable $item);
final public function scan(IScannable $item) {
$this->prepareToScan($item);
while (($chunk = $item->fread()) !== false) {
$this->sendData($chunk);
}
return $this->finishScan($item);
}
abstract protected function sendData($data);
abstract protected function finishScan(IScannable $item);
abstract public function shutdownScanner();
}
or
abstract class ClamavScanner implements IScanner {
private $scannerConfig;
public function initScanner() {
$this->scannerConfig = ConfigHelper::getConfig();
}
abstract protected function prepareScan(IScannable $item, $scannerConfig);
final public function scan(IScannable $item) {
$this->prepareScan($item, $this->scannerConfig);
$stop = false;
while (($chunk = $item->fread()) !== false || $stop) {
$stop = $this->sendData($chunk, $this->scannerConfig);
}
return $this->finishScan($item, $this->scannerConfig);
}
abstract protected function sendData($data, $scannerConfig);
abstract protected function finishScan(IScannable $item, $scannerConfig);
public function shutdownScanner() {
}
}
The main point is that the scan
function will be final and it won't be modified.
The abstract methods are expected to be implemented by the subclasses in different ways. For example, the prepareScan
method might open a pipe to a file in case of the LocalScanner
, but the DaemonScanner
might open a socket instead. Buffering can also be implemented by the subclasses, if needed, in the sendData
method in order to send a bigger chunk of data.
The method signature can also be adjusted according to necessities. The abstract class can provide some configuration to each implementation, likely from the same place, instead of letting each implementation fetch the configuration from different places.
For the ICAPScanner
, the idea is the same:
abstract class ICAPScanner implements IScanner {
public function initScanner() {
}
abstract protected function prepareRequest(IScannable $item): ICAPRequestData;
final public function scan(IScannable $item) {
$requestData = $this->prepareRequest($item);
$client = new ICAPClient($requestData->getHost(), $requestData->getPort());
if ($requestData->getType() === 'reqmod') {
$response = $client->reqmod($requestData->getService(), $requestData->getBody(), $requestData->getHeaders());
} else {
$response = $client->respmod($requestData->getService(), $requestData->getBody(), $requestData->getHeaders());
}
return $this->processResponse($response);
}
abstract protected function processResponse(array $data): Status;
public function shutdownScanner() {
}
}
In this case, the implementations are expected to provide whatever the ICAPClient
needs to perform the request in the prepareRequest
method and return it as an ICAPRequestData
. Once the ICAPClient
sends the request, the implementations have a chance to process the response.
There might be some adjustments to be made (ICAPClient
might need to handle a file stream somehow) but it should fit the current code.
Responsibilities are quite clear in this case: the ICAPScanner
will send the request, while the implementations will just prepare the data and process the result.
Also note that, if we need to provide a generic ICAP scanner, it should have its own implementation at the same level of McAfeeWeb and Fortinet. Trying to implement it in the ICAPScanner
will be confusing because the responsibilities won't be clear any longer.
I think this should cover everything we have at the moment and it should allow some improvements without major code changes. So the question is how we can reach this proposed goal.
I'd propose the following steps:
In any case, I think it's critical to have code documentation explaining what are the expectations and who is responsible of what.
Only one comment: The ICAPScanner is not abstract as this is a valid use case. Fortinet and McAfee are only specializations.
Only one comment: The ICAPScanner is not abstract as this is a valid use case. Fortinet and McAfee are only specializations.
I think it should be. We can provide a GenericICAPScanner
for that use case. It should be just splitting and moving the code that is already there.
By having those abstract methods is easier to know that the subclasses must implement those methods. If the methods are implemented in the ICAPScanner
, it won't be clear whether they should be overwritten in the subclasses or they can be used, even if the methods are protected.
The responsibilities are also clearer: subclasses are expected to prepare the data for the request and to process the response, while the ICAPScanner
will just send the prepared request. If the ICAPScanner
provides a default implementation it seems it's overstepping its responsibilities.
The onAsyncData() and onScanComplete() methods are driven by the stream wrapper design.
A simple scan() will not work in this case .... at least I don't see an easy way around it ...
Regarding initScanner():
ScannerFactory::getScanner() - which is better named buildScanner - can take the file as argument into the factory. But this will not help that much.
To make it obvious that one scanner instance is to be used for scanning one file a better name would be IScanTask or something ....
TBH: I don't want to rewrite it all :see_no_evil:
The onAsyncData() and onScanComplete() methods are driven by the stream wrapper design.
scanning right at this point is necessary to stop upload as early as possible - at least this was the idea .... reviewing the code revealed that the file is only scanned at the end of the upload - even if only partially...
@jvillafanez I suggest to merge this as is - it fulfills the need of fixing fortinet scanner integration.
I'll move your suggestions to a new ticket for further discussion and followup task for the future.
damn - forgot to squash .... :see_no_evil:
Confirmed fixed in 1.2.0-rc.1
…ring