google / magika

Detect file content types with deep learning
https://google.github.io/magika/
Apache License 2.0
7.75k stars 412 forks source link

[Misdetection] OpenSSH private key file misdetected as <PEM certificate> #550

Open ricardo-reis-1970 opened 3 months ago

ricardo-reis-1970 commented 3 months ago

What should the file have been detected as? What has the file been misdetected as? OpenSSH private key files are being mistaken for PEM certificate files.

Please link or attach the misdetected file below (Do NOT upload PII!) The file could not be uploaded as the format is not supported. Also, the file is a private key, so I'm inclined to not share it.

Additional context Private keys look a lot like PEM certificates, except that their structure is:

-----BEGIN PRIVATE KEY-----
[...Base64 gibberish...]
-----END PRIVATE KEY-----

whereas certificates' format is:

-----BEGIN CERTIFICATE-----
[...Base64 gibberish...]
-----END CERTIFICATE-----

I would have thought that this was an easy distinction, and in fact, the Linux command file manages to distinguish them.

I got this mistake both in command-line and from Python code. I'm using the model standard_v1. Should I be doing something different?

reyammer commented 3 months ago

Thanks for opening an issue. The way I see it is that PEM is the container file format (with the classic pattern---- BEGIN HEADER NAME ---- ---- END HEADER NAME ----, and this could contain a number of different cryptographic material (a certificate, public key, private key, etc.). [1]

In which case, I would say that Magika's output as "pem" is OK, but the PEM certificate description is misleading and we should update it to be more generic. @corkamig thoughts?

Another question is whether we should try to be finer grained, and try to distinguish between PEM certificate vs. PEM "other crypto material". We are working on the next version, which introduces support for many more content types, maybe we can try to take a stab this for the next iteration...

[1] https://en.wikipedia.org/wiki/Privacy-Enhanced_Mail