neilharvey / FileSignatures

A small library for detecting the type of a file based on header signature (also known as magic number).
MIT License
259 stars 42 forks source link

For *.pptx DetermineFileFormat always returns null #61

Closed v-pimenau closed 11 months ago

v-pimenau commented 1 year ago

When I try to get format of the pptx file DetermineFileFormat always returns null

neilharvey commented 1 year ago

It should work - there is a minimal sample in the tests which passes.

The Powerpoint format works by attempting to search the pptx archive for a file named presentation.xml, which should be present. Perhaps there are writers which create non-standard versions (which I've seen before with Word documents). Are you able to provide a minimal file which cannot be detected?

v-pimenau commented 1 year ago

Hi. sorry for the delay. It happens for each newly created *.pptx file.

neilharvey commented 1 year ago

I tried creating a fresh pptx via PowerPoint (Office 365 Version 2310) and it seems to be working as expected. Would you be able to upload a blank PPTX that you've created so I can have a look at it?

If that's not possible, then you can investigate yourself as follows:

  1. Unzip the .pptx file into a directory (it's a zip archive under the scenes)
  2. Open [Content_Types].xml from the root directory. This file will contain definitions of all the different
  3. Within that file look for a section similar to this: <Override PartName="/ppt/presentation.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.presentation.main+xml"/>

This defines the main presentation XML file and is what we are searching for to identify the PPTX. We look for 'presentation.xml' by default, with a slight fuzz factor so that small variations are also matched.

What do your PPTX files contain as the PartName for the presentation file?

v-pimenau commented 12 months ago

Hi. Sorry for the delay. This issue appears when I create a new empty pptx document

v-pimenau commented 12 months ago

5.pptx

neilharvey commented 12 months ago

Hey, thanks for sending the sample pptx - but when I try to download it, it appears to be zero bytes in size.
Could you try reuploading it / another sample?

v-pimenau commented 11 months ago

Hey. It is should be zero bytes in size. Because this issue appears when pptx file is empty

neilharvey commented 11 months ago

Ah, this library works by reading the header bytes of a file to determine the format - so if the file has a zero length there isn't anything we can do, sorry. I had assumed you meant a blank document - which would work because it would contain the minimal zip/xml entries for a valid PowerPoint file.

v-pimenau commented 11 months ago

Ok, got it. Thank you for your support. I will close this issue.