neilharvey / FileSignatures

A small library for detecting the type of a file based on header signature (also known as magic number).
MIT License
250 stars 41 forks source link

For *.pptx DetermineFileFormat always returns null #61

Closed v-pimenau closed 9 months ago

v-pimenau commented 10 months ago

When I try to get format of the pptx file DetermineFileFormat always returns null

neilharvey commented 10 months ago

It should work - there is a minimal sample in the tests which passes.

The Powerpoint format works by attempting to search the pptx archive for a file named presentation.xml, which should be present. Perhaps there are writers which create non-standard versions (which I've seen before with Word documents). Are you able to provide a minimal file which cannot be detected?

v-pimenau commented 10 months ago

Hi. sorry for the delay. It happens for each newly created *.pptx file.

neilharvey commented 10 months ago

I tried creating a fresh pptx via PowerPoint (Office 365 Version 2310) and it seems to be working as expected. Would you be able to upload a blank PPTX that you've created so I can have a look at it?

If that's not possible, then you can investigate yourself as follows:

  1. Unzip the .pptx file into a directory (it's a zip archive under the scenes)
  2. Open [Content_Types].xml from the root directory. This file will contain definitions of all the different
  3. Within that file look for a section similar to this: <Override PartName="/ppt/presentation.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.presentation.main+xml"/>

This defines the main presentation XML file and is what we are searching for to identify the PPTX. We look for 'presentation.xml' by default, with a slight fuzz factor so that small variations are also matched.

What do your PPTX files contain as the PartName for the presentation file?

v-pimenau commented 9 months ago

Hi. Sorry for the delay. This issue appears when I create a new empty pptx document

v-pimenau commented 9 months ago

5.pptx

neilharvey commented 9 months ago

Hey, thanks for sending the sample pptx - but when I try to download it, it appears to be zero bytes in size.
Could you try reuploading it / another sample?

v-pimenau commented 9 months ago

Hey. It is should be zero bytes in size. Because this issue appears when pptx file is empty

neilharvey commented 9 months ago

Ah, this library works by reading the header bytes of a file to determine the format - so if the file has a zero length there isn't anything we can do, sorry. I had assumed you meant a blank document - which would work because it would contain the minimal zip/xml entries for a valid PowerPoint file.

v-pimenau commented 9 months ago

Ok, got it. Thank you for your support. I will close this issue.