target / strelka

Real-time, container-based file scanning at enterprise scale
Other
878 stars 113 forks source link

[REQUEST] Allow Strelka Clients to do Global Hash-Based De-duplication #399

Open ryanohoro opened 1 year ago

ryanohoro commented 1 year ago

Is your feature request related to a problem? Please describe.

Strelka clients cannot query what files Strelka already has in the Gatekeeper cache. This causes clients to consume excessive bandwidth when clients send duplicate files to the Frontend, where they are hashed and the Gatekeeper cache is utilized.

Describe the solution you'd like

Enhance the Strelka client GRPC protocol to allow the clients to send a hash to the Frontend, and receive a response that includes the Gatekeeper cache status, and optionally the age of the cache entry. Clients receiving a cache hit response SHOULD NOT send the cached file to the Frontend UNLESS the client request is configured to ignore Gatekeeper caching.

Describe alternatives you've considered

It may also be desirable to implement local hash-based de-duplication in the clients, depending on how sensitive an environment is to connection volume, rather than bandwidth. However, global de-duplication is easier to implement and is more useful at large scales.

Additional context

derfel1989 commented 1 year ago

@ryanohoro SecurityOnion already produces what you want.

You can do reverse engineering and bring this feature to your environment.