tendenci / tendenci

Tendenci - The Open Source Association Management System (AMS)
https://www.tendenci.com
Other
486 stars 201 forks source link

MIME-Type error when uploading older MSWord .doc document #847

Closed rob-hills closed 4 years ago

rob-hills commented 4 years ago

We're running Tendenci 12.1 on our site.

One of our administrators today tried to upload an older MSWord .doc document about 1.5MB in size. When we try this we get a MIME-Type error"

MIME type 'application/CDFV2' is not valid. Allowed types are: image/gif, image/jpeg, image/png, image/tiff, image/x-ms-bmp, video/x-ms-wmv, video/quicktime, video/mpeg, video/mp4, text/plain, application/msword, text/csv, application/ms-excel, application/ms-powerpoint, application/vnd.ms-powerpoint, application/vnd.openxmlformats-officedocument.presentationml.slideshow, text/vcard, application/pdf, application/zip.

When I check this file using mimetype on my computer (Ubuntu 18.04LTS) it returns the expected "application/msword" mimetype, which appears in the list of valid types in the error message.

I had a look at the code that checks the uploaded file (tendenci/tendenci/apps/files/validators.py - FileValidator class) and established that it uses the magic.from_buffer() method to check the mimetype. I also Googled the error.

From https://github.com/kaleidos/django-validated-file/issues/9 it appears that the magic buffer interface may not be able to determine the correct mimetype for an older Word document unless the whole document is passed to the buffer rather than the 1024 bytes passed by the FileValidator class.

The above issue from Kaleidos also suggested that reversing the order of testing from mimetype, file size to file size, mimetype would prevent a file size error being incorrectly fed back to the user as a confusing mime type error if a .doc was uploaded

It seems to me the appropriate fix for this would be to reverse the order of mimetype checking and filesize checking and then if the filename extension is ".doc", pass the whole document to the magic buffer interface.

I can have a go at submitting a pull request for this if I get some time over the next day or two.

eschipul commented 4 years ago

@trawick Can you confirm you have scanned the Word “.doc” file is not infected with a virus, malware, or duplicate mime type? I suggest that as the starting point.

While this example exploit method link is PHP (which I block in our production environments entirely) the basic technique still applies on file upload vulnerabilities. https://www.hackingarticles.in/5-ways-file-upload-vulnerability-exploitation/

A quick google search for “ libmagic .doc hack exploit” gives lots of options. Don’t forget many hacks like that are shared via those text only YouTube videos you may need to find linked in a forum.

I’d also take this as an opportunity to run “apt update && apt upgrade” as I seem to recall a recent update.

Upgrade to latest python on the box and in virtualenv as well. Check file level permissions to be sure that x is not listed.

Are you going through an nginx reverse proxy?

I’m just reluctant to suggest a pull request until looking for the most likely issues. Reading mime type out of default order just makes me suspicious of a payload.

Real world solution in the short term: save as pdf and upload that. Or save word as .rtf and then paste into a clean word doc and see if it still happens.

?

rob-hills commented 4 years ago

Hi Ed,

Thanks for your reply.

The file I wanted to upload is a template (though not a Word Template file, just a document) letterhead document for use by the various volunteers in our organisation. As many use Word but not necessarily the latest version, we have a business case for having this file on our website for them to download. The volunteer who supplied the file is pretty tech-savvy so I think it is unlikely to be infected by a virus. As I use Ubuntu, I am not overly concerned about Word viruses I have to confess.

But for the record, I uploaded the document to an online virus scanner (using ClamAV) and it was reported as being virus free.

To explore further, I created my own test file using LibreOffice Write and saved it in MSWord .doc format. The Document was simply the word "test" and nothing else. First, I tried to upload the file to our Tendenci site using Tendenci's file upload page. I got exactly the same error message MIME type 'application/CDFV2' is not valid.

Next, I uploaded the same file to our site webserver via a SSH session and opened a Django shell to test it out. First I imported magic:

import magic

Then I tried loading my file in a similar manner to the code in the Tendenci FileValidator class:

>>> magic.from_buffer(open('/home/robh/test/000-test-word-upload.doc', mode='rb').read(1024), mime=True)
'application/CDFV2'

As you can see, it returned the same (incorrect) MimeType as I saw when trying to upload to Tendenci.

Then I tried several more times using increasing sizes for the read buffer size:

>>> magic.from_buffer(open('/home/robh/test/000-test-word-upload.doc', mode='rb').read(2096), mime=True)
'application/CDFV2'
>>> magic.from_buffer(open('/home/robh/test/000-test-word-upload.doc', mode='rb').read(4192), mime=True)
'application/CDFV2'
>>> magic.from_buffer(open('/home/robh/test/000-test-word-upload.doc', mode='rb').read(8284), mime=True)
'application/CDFV2'

Each time I got the same result until finally:

>>> magic.from_buffer(open('/home/robh/test/000-test-word-upload.doc', mode='rb').read(16000), mime=True)
'application/msword'

So, as noted in my original post, it has been suggested in another application experiencing this bug that for MSWord documents, the whole document needs to be processed in order to get the correct MimeType.

Finally, I will note that using the magic.from_file() method returns the correct mimetype for my test file:

>>> magic.from_file('/home/robh/test/000-test-word-upload.doc', mime=True)
'application/msword'
eschipul commented 4 years ago

@rob-hills -- ha ha!--> "As I use Ubuntu, I am not overly concerned about Word viruses I have to confess." - I know, right??? But I digress....

This seems like an edge case where I don't understand why they don't just upload it to S3 or DropBox and link to it?

I can only speak to our environment, but the trace would go through the reverse proxies, my own security rules, then the standard rules, then OSSEC as well as a few other sets of rules, then passed back to the VPC and divided into the microservices containera with their own IDS/IPS/WAF rules, and then all of which would have to be monitored and maintained.

I do see your point, if the lib says it allows it, why not? I also see the flip side that uploading to an alternative given this is a weird use case may not be worth the research with a simple alternative solution. Further I just created a "test.doc" file on the demo site and uploaded through /files/ with no problem. (yes the mime type would be "text" but you get my point that it isn't the file extension.)

Occam's Razor says - Let's just assume something is up like the file system byte check if throwing an error based on discrepancies between originating file systems. I'm just saying I think we should pick our fights and this one doesn't smell right given such easy alternatives (link to S3, etc...).

I'm going to close as I don't see "value over risk", or "time invested to research over value achieved", being worth it.

Feel free to object of course and I can scan the sites that do host with us for how many actual ".doc" files are in use. Or we could move forward with more interesting UI/UX or CI/CD or Testing Suite implementations, right?

rob-hills commented 4 years ago

Hi Ed,

With respect, I beg to differ. And I have fixed the bug and submitted a Pull Request. Out of the Box, Tendenci apparently supports older Word and Excel documents (.doc and .xls are among the default allowed file extensions and applicaton/msword and application/excel are among the default permitted mimetypes. Except that it doesn't support these documents because you get baffling error messages when you try to upload them.

I work with several volunteer organisations and all of them use older Word and Excel documents all the time. This is usually because they haven't been upgrading their M$ Word and Excel programs for a long time, partly because they can't afford to and partly because they can't be bothered relearning how to use these tools every time M$ makes widespread, confusing changes to their UIs to make it look like you are getting something new for your upgrade $$.

Here in Australia, many volunteers still share old versions of M$ documents, and most prefer to use their trusted club/association server to preserve these files rather than foreign-based cloud servers like DropBox.

For all these reasons, I have put the effort into fixing this bug and I offer the fix to the Tendenci community. We will implement these changes in our site regardless. You are obviously free to accept or reject the pull request, but I hope that by submitting it I can at least publicise the fix for others who have the same issue.

Cheers,

Rob Hills Waikiki, Western Australia

jennyq commented 4 years ago

Hi Rob,

The pull request you submitted does fix the issue (Thank you!). But I agree with Ed on the security considerations. Anybody still using old MS Word (MS Word 97-2003?) is putting their systems at risk. Seriously, you should upgrade it or use some alternatives if you still have it.

We're going to remove the .doc and .xls from the allowed upload file extensions. For the old .doc files, they can be converted to .docx before being uploaded.

Thanks, Jenny

trawick commented 4 years ago

Anybody still using old MS Word (MS Word 97-2003?) is putting their systems at risk. Seriously, you should upgrade it or use some alternatives if you still have it.

Re: your justification: Files created with those levels of MS Word will exist for a long time to come and can be opened with newer Office or with some other software, and wouldn't ordinarily be converted to a newer format unless modifications are needed or Word itself drops support.

(I don't have any particular requirement on this software at present so I don't care how it is resolved. Since I got pulled into this ticket I thought I'd share a comment on the conversation ;) )

jennyq commented 4 years ago

@trawick , thanks for sharing!

Files created with those levels of MS Word will exist for a long time to come and can be opened with newer Office or with some other software

True. But the risk is real if you use any software that is no longer supported (it is especially high for MS office due to its popularity). If you search MS product support lifecycle, the extended support for MS office 97 and MS office 2003 has long ended (in 2002 and 2005, respectively). https://support.microsoft.com/en-us/lifecycle/search?alpha=Microsoft%20Office%2097%20Professional%20Edition