openpreserve / odf-validator

Open source Open Document Format (ODF) validation
http://odf.openpreservation.org/
BSD 3-Clause "New" or "Revised" License
3 stars 0 forks source link

MIM-3 error in Excel file saved as .ods #154

Open RvanVeenendaal opened 2 months ago

RvanVeenendaal commented 2 months ago

For the attached Excel 365 file "TEST_Remco.ods" with "TEST" in cell A1 and "Remco" in cell B1, saved as .ods, the ODF Validator produces MIM-3 and XML-4 errors.

MIM-3: looking at the unzipped files, I can't seem to find what is wrong with the mimetype file that it produces the MIM-3 error. It would help me to know what to look for in the future.

XML-4: the manifest:version attribute does however seem to be missing from what Excel saved. If all .ods files saved with Excel 365 produce this error, should we document this somewhere as a known issue? Otherwise each user might start looking for the same solution, over and over again.

TEST_Remco.ods _APP-1: [INFO] Validating spreadsheets\TEST_Remco.ods. APP-4: [INFO] Validation report for spreadsheets\TESTRemco.ods. MIM-3: mimetype [ERROR] The "mimetype" file SHALL NOT use an 'extra field' in its header. DOC-3: mimetype [INFO] OpenDocument MIMETYPE application/vnd.oasis.opendocument.spreadsheet detected XML-4: META-INF\manifest.xml [ERROR] Not a valid XML document. Validation exception at line 2 and column 88: element "manifest:manifest" missing required attribute "manifest:version". PKG-7: Thumbnails\thumbnail.png [WARNING] An OpenDocument Package SHOULD contain a preview image Thumbnails/thumbnail.png. NOT VALID, 2 errors, 1 warnings and 1 info messages.

carlwilson commented 2 months ago

Hi @RvanVeenendaal. I've taken a look at the file in question. I'll deal with the MIM-3 error first. This is caused by extra headers in the zip file for the mimetype file entry. The specification explicitly disallows these for that file entry and they do appear to be present. Here's my debug capture of the entry in question: image Looking closely this appears to be and extended timestamp of some , see the 0x5455 entry listed here: https://libzip.org/specifications/extrafld.txt.

My guess is that it's the wierd file last modified date that I can see using zipinfo:

Central directory entry #1:                                                                                                                                                                                                                                   
---------------------------                                                                                                                                                                                                                                   

  mimetype                                                                                                                                                                                                                                                    

  offset of local header from start of archive:   0                                                                                                                                                                                                           
                                                  (0000000000000000h) bytes                                                                                                                                                                                   
  file system or operating system of origin:      MS-DOS, OS/2 or NT FAT                                                                                                                                                                                      
  version of encoding software:                   4.5                                                                                                                                                                                                         
  minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT                                                                                                                                                                                      
  minimum software version required to extract:   1.0                                                                                                                                                                                                         
  compression method:                             none (stored)                                                                                                                                                                                               
  file security status:                           not encrypted                                                                                                                                                                                               
  extended local header:                          no                                                                                                                                                                                                          
  file last modified on (DOS date/time):          1980 Jan 1 00:00:00                                                                                                                                                                                         
  32-bit CRC value (hex):                         8a396c85                                                                                                                                                                                                    
  compressed size:                                46 bytes                                                                                                                                                                                                    
  uncompressed size:                              46 bytes                                                                                                                                                                                                    
  length of filename:                             8 characters                                                                                                                                                                                                
  length of extra field:                          0 bytes                                                                                                                                                                                                     
  length of file comment:                         0 characters                                                                                                                                                                                                
  disk number on which file begins:               disk 1                                                                                                                                                                                                      
  apparent file type:                             binary                                                                                                                                                                                                      
  non-MSDOS external file attributes:             000000 hex                                                                                                                                                                                                  
  MS-DOS file attributes (00 hex):                none                                                                                                                                                                                                        

  There is no file comment.

The version thing is a pain as various bits of software appear to play "fast and loose" here, but there's already an ongoing discussion at #150