Open DeepDiver1975 opened 2 years ago
For the current asyncScan which is performed while an upload happens, we'll have to extend the current plans. The main problem is that we have to scan (or send to scan) partial data that is being written into a stream. Right now, this is done through a stream wrapper.
Conceptually, we'll split the architecture between what the scanners are expected to provide to the outside, and how the scanners are expected to be implemented.
The IScanner
will add the following methods to the ones that are planned
interface IScanner {
public function startAsync($params);
public function onAsyncData($data);
public function finishAsync(): Status;
}
Note that we might need to define what exceptions can be thrown from each of those methods.
We'll likely have additional requirements to be implemented, such as:
scan
function can't be called while an async scan is in progress (there might be problems if the data is mixed).The expected usage would be something like (just little touches here and there, but it should be the same as what is currently in place):
$scanner->initScanner();
$scanner->startAsync(['filename' => 'eicarvirus.test']);
return CallBackWrapper::wrap(
$stream,
null,
function ($data) use ($scanner) {
$scanner->onAsyncData($data);
},
function () use ($scanner, $path) {
$status = $scanner->finishAsync();
if ($status === Status::INFECTED) {
throw new Exception(....)
}
}
);
Additional metadata might be needed for the async scan.
The implementations, however, won't require any change. The main trick is that both the ClamavScanner
and the ICAPScanner
abstract classes will implement those methods using the same abstract methods for the regular scan.
In order to do so, first will need to implement a FakeStreamScannable
, mainly to hold the stream metadata while implementing the IScannable
interface. We might need to make the IScannable
interface a but more flexible to return more metadata, not just the filename.
class FakeStreamScannable implements IScannable {
private $filename;
public function __construct($filename) {
$this->filename = $filename;
}
public function fread() {
return false;
}
public function getFilename() {
return $this->filename;
}
}
abstract class ClamavScanner implements IScanner {
final public function startAsync($params) {
$this->runningAsync = true;
$this->fakeAsyncScannable = new FakeStreamScannable($params['filename']);
$this->prepareScan($fakeScannable);
}
final public function onAsyncData($data) {
if ($this->runningAsync) {
$this->sendData($data);
} else {
throw new \Exception(.....);
}
}
final public function finishAsync(): Status {
$this->runningAsync = false;
return $this->finishScan($this->fakeScannable);
}
}
abstract class ICAPScanner implements IScanner {
final public function startAsync($params) {
$this->runningAsync = true;
$this->fakeAsyncScannable = new FakeStreamScannable($params['filename']);
$this->preparedData = $this->prepareRequest($fakeScannable);
}
final public function onAsyncData($data) {
if ($this->runningAsync) {
$this->data .= $data;
} else {
throw new \Exception(.....);
}
}
final public function finishAsync(): Status {
$this->runningAsync = false;
$requestData = $this->preparedData;
$client = new ICAPClient($requestData->getHost(), $requestData->getPort());
if ($requestData->getType() === 'reqmod') {
$response = $client->reqmod($requestData->getService(), $requestData->getBody(), $requestData->getHeaders());
} else {
$response = $client->respmod($requestData->getService(), $requestData->getBody(), $requestData->getHeaders());
}
return $this->processResponse($response);
}
}
(Code shown is based on what we currently have. It might be improved)
The important point is that subclasses will only have 2-3 methods to implement, and those will be reused for both the sync and async scans.
There might be some adjustments to be done with the overall plan. The ClamavScanner
and ICAPScanner
require subclasses to deal with IScannable
objects. While this is fine, we have to create a FakeStreamScannable
class to be used internally for the async scan.
Taking into account that the abstract methods are unlikely to use the actual data from the IScannable
object but just the metadata (currently only the filename), they could use an array or an hypothetical Metadata
from an hypothetical $scannableObj->getMetadata(): Metadata
method. This way we wouldn't need to create the FakeStreamScannable
class.
Outside of the ICAP scanner, I'm not sure if we're officially supporting any scanner other than clamav, otherwise we can adjust the
ClamavScanner
class name to another one.For the
ClamavScanner
there are a couple of options:or
The main point is that the
scan
function will be final and it won't be modified. The abstract methods are expected to be implemented by the subclasses in different ways. For example, theprepareScan
method might open a pipe to a file in case of theLocalScanner
, but theDaemonScanner
might open a socket instead. Buffering can also be implemented by the subclasses, if needed, in thesendData
method in order to send a bigger chunk of data. The method signature can also be adjusted according to necessities. The abstract class can provide some configuration to each implementation, likely from the same place, instead of letting each implementation fetch the configuration from different places.For the
ICAPScanner
, the idea is the same:In this case, the implementations are expected to provide whatever the
ICAPClient
needs to perform the request in theprepareRequest
method and return it as anICAPRequestData
. Once theICAPClient
sends the request, the implementations have a chance to process the response.There might be some adjustments to be made (
ICAPClient
might need to handle a file stream somehow) but it should fit the current code.Responsibilities are quite clear in this case: the
ICAPScanner
will send the request, while the implementations will just prepare the data and process the result.Also note that, if we need to provide a generic ICAP scanner, it should have its own implementation at the same level of McAfeeWeb and Fortinet. Trying to implement it in the
ICAPScanner
will be confusing because the responsibilities won't be clear any longer.I think this should cover everything we have at the moment and it should allow some improvements without major code changes. So the question is how we can reach this proposed goal.
I'd propose the following steps:
In any case, I think it's critical to have code documentation explaining what are the expectations and who is responsible of what.
_Originally posted by @jvillafanez in https://github.com/owncloud/files_antivirus/issues/512#issuecomment-1318474846_