neilharvey / FileSignatures

A small library for detecting the type of a file based on header signature (also known as magic number).
MIT License
250 stars 41 forks source link

FileFormatInspector overload with File Extension #48

Open amitpercept opened 2 years ago

amitpercept commented 2 years ago

I think it would be good that FileFormatInspector should have one more overload with File Extension as parameter wherein we can pass in the allowed extensions (".jpg, .txt, .xls, .doc, .pdf") in the application and it will check based on those extensions. The problem with File Format is that we have to use several if conditions to check for the allowed extensions (which are sometimes stored either in database or in configuration). Something like this:

https://github.com/dotnet/AspNetCore.Docs/blob/main/aspnetcore/mvc/models/file-uploads/samples/3.x/SampleApp/Utilities/FileHelpers.cs

neilharvey commented 2 years ago

Would something like this work for you?

var allowedExtensions = new string[] { "jpg", "xls", "doc", "pdf" };

var allowedFormats = FileFormatLocator.GetFormats()
    .Where(f => allowedExtensions.Contains(f.Extension))
    .ToArray();

var inspector = new FileFormatInspector(allowedFormats);

Longer gist for a sample console application: https://gist.github.com/neilharvey/a08f02bcd5062a47dc4921423d2ff12f

Whilst I could push this code into an additional constructor, I'm wary to do so because users might think that we are considering the extension of the files to determine the format which we don't do because the file could have the wrong extension (either accidentally or deliberately).

amitpercept commented 2 years ago

This is how we handled the problem right now:

        var format = _fileFormatInspector.DetermineFileFormat(stream);

        if (string.IsNullOrEmpty(format.Extension) || !permittedExtensions.Contains("." + format.Extension))
        {
            IsValid = false;
            ValidMessage = "File's Signature Does Not Match the File Extension";
            return;
        }

Here I am assuming that FileFormatInspector is returning the correct extension. Also, in our application, we have several pages (around 9) for uploads. Few allows only images, few allows only pdf, and few allows all (word, excel, pdf, image). We have taken different permittedExtensions in appsettings.json for each page.

To be very frank, what I think that the basic concept of File Signatures is that someone cannot upload malicious files by changing the extension of the file (with the allowed ones). But, in View (html etc.) the file input checks only extension and as a developer my focus will always be on that file extension (not on format). That's why I suggested this. You may take opinion from others as well.

neilharvey commented 2 years ago

Pushing the extension check into our library doesn't remove the if statement, the code would just become something like:

if(!_fileFormatInspector.IsValidFormat(stream, permittedExtensions))
{
    IsValid = false;
    ValidMessage = "File's Signature Does Not Match the File Extension";
    return;
}

How I would do this is to push the validation into a FluentValidation validator, something like this (you could pull the allowed extensions from your appsettings.json file):

    public class UploadModelValidator : AbstractValidator<UploadModel>
    {
        public UploadModelValidator()
        {
            RuleFor(x => x.Files)
                .HasFormat("jpg", "xls", "doc", "pdf");
        }
    }

HasFormat is an extension method on IRuleBuilder:

public static class ValidatorExtensions
{
        public static IRuleBuilderOptions<T, IFormFile> HasFormat<T>(this IRuleBuilder<T, IFormFile> ruleBuilder, params string[] permittedExtensions)
        {
            return ruleBuilder.SetValidator(new FileFormatValidator<T, IFormFile>(permittedExtensions);
        }
}

FileFormatValidator implementation would look something like the following:

public class FileFormatValidator<T, TProperty> : PropertyValidator<T, TProperty>
{
    private readonly IEnumerable<string> _permittedExtensions;

    protected FileFormatValidator(IEnumerable<string> permittedExtensions)
    {
        _permittedExtensions = permittedExtensions;
    }

    public override string Name => "FileFormatValidator";

    protected override string GetDefaultMessageTemplate(string errorCode)
        => "{PropertyName} must be one of the following formats: {FileFormats}.";

    public override bool IsValid(ValidationContext<T> context, TProperty value)
    {
        if (value is IFormFile file)
        {
            using var stream = file.OpenReadStream();
            var inspector = new FileFormatInspector();
            var format = inspector.DetermineFileFormat(stream);
            if (format == null || !_permittedExtensions.Contains(format.Extension))
            {
                var extensions = string.Join(", ", _permittedExtensions);
                context.MessageFormatter.AppendArgument("FileFormats", extensions);
                return false;
            }
        }

        return true;
    }
}

That would allow you to push all your extension checks into your validation pipeline and then not care about them on the individual pages.