neilharvey / FileSignatures

A small library for detecting the type of a file based on header signature (also known as magic number).
MIT License
250 stars 41 forks source link

DetermineFileFormat always returns null #65

Closed adam-bentley closed 8 months ago

adam-bentley commented 8 months ago

Hi,

Im uploading a collection of base 64 encoded files to a web API. I want to validate these files to ensure they're acceptable file types. However, when I try to validate them I get null. I have tried this for both JPGs and PDFs.

foreach (var attachment in request.Attachments)
{
    string base64 = Convert.ToBase64String(attachment.Content);
    using Stream stream = new MemoryStream(Encoding.UTF8.GetBytes(base64));
    stream.Position = 0; // Tried omitting this line
    FileFormat? fileFormat = fileFormatInspector.DetermineFileFormat(stream);
 }

I've tried both these:

var inspector = new FileFormatInspector();
services.AddSingleton<IFileFormatInspector>(inspector);

and this:

var recognised = FileFormatLocator.GetFormats().OfType<Image>();
var inspector = new FileFormatInspector(recognised);
services.AddSingleton<IFileFormatInspector>(inspector);

Am I doing something wrong? Is this possible?

Thanks, Adam

neilharvey commented 8 months ago

It should work - you mentioned that the uploads are base-64 encoded files. Is this what attachment.Content contains? If so, wouldn't calling string base64 = Convert.ToBase64String(attachment.Content) be double-encoding the content?

In that case does changing the following line to using Stream stream = new MemoryStream(Encoding.UTF8.GetBytes(attachment.Content)); work?

adam-bentley commented 8 months ago

I took an image and used this site to turn it into a base64 encoded image. https://base64.guru/converter/encode/file

Attachment.Content is a byte[] but I've also tested it as a string too.

foreach (var attachment in request.Attachments)
{
    //string base64 = Convert.ToBase64String(attachment.Content);
    using Stream stream = new MemoryStream(Encoding.UTF8.GetBytes(attachment.Content));
    stream.Position = 0;
    FileFormat? fileFormat = fileFormatInspector.DetermineFileFormat(stream);
}
neilharvey commented 8 months ago

What does the attachment.Content byte array contain? The raw image bytes or the base-64 encoded image which is then being sent as a byte array?

If it's the former, then you don't need to do anything with base-64 at all, just read attachment.Content into a MemoryStream and then use that.

If it's the latter, then you'll need to read attachment.Content into a string, then convert that base-64 encoded string back into the original image data and then copy that into a MemoryStream.

Something like this:

foreach(var attachment in request.Attachments)
{
    string base64 = Encoding.UTF8.GetString(attachment.Content);
    byte[] originalBytes = Convert.FromBase64String(base64);
    using MemoryStream stream = new MemoryStream(originalBytes);
    FileFormat? fileFormat = fileFormatInspector.DetermineFileFormat(stream);
}

originalBytes should contain the original image, you could save it to the file system or put a breakpoint and compare the first few bytes against your image file to confirm.

Is that any help?

adam-bentley commented 8 months ago

You were correct, I was able to solve it with:

if (request.Attachments != null)
{
    foreach (var attachment in request.Attachments)
    {
        using Stream stream = new MemoryStream(attachment.Content);
        stream.Position = 0;
        FileFormat? fileFormat = fileFormatInspector.DetermineFileFormat(stream);
    }
}

Thanks for your help.