qrilka / xlsx

Simple and incomplete Excel file parser/writer
MIT License
128 stars 62 forks source link

Sheets not found due to sheets being referenced by absolute path. #171

Closed luke-clifton closed 6 months ago

luke-clifton commented 6 months ago

I have an xlsx file that won't parse because:

*** Exception: MissingFile "/xl/worksheets/sheet.xml"
CallStack (from HasCallStack):
  error, called at src/Codec/Xlsx/Parser.hs:94:22 in xlsx-1.1.1-054dc7745149d1a0a8c7f44a692517d728629241f887be768ba785b6fb33905d:Codec.Xlsx.Parser

That file is in the archive, but it seems that findEntryByPath returns Nothing with the leading slash.

https://github.com/qrilka/xlsx/blob/426eb0ad3aa6686cfdff309d87a2428df0011f83/src/Codec/Xlsx/Parser.hs#L358

qrilka commented 6 months ago

Hi @luke-clifton could you attach a test file maybe? I'd try to find some time over the weekend

luke-clifton commented 6 months ago

If I make the below change, it can parse my file. (I've added the dropWhile (== '/') in Codec.Xlsx.Parser

extractSheet ar sst contentTypes caches wf = do
  let filePath = dropWhile (== '/') $ wfPath wf
  file <- note (MissingFile filePath) $ Zip.fromEntry <$> Zip.findEntryByPath filePath ar

The file is the Profile.xlsx found in the Garmin FIT SDK

I've attached it here as well.

Profile.xlsx

qrilka commented 6 months ago

@luke-clifton I've checked the spec and according to the section "7.3.4 Mapping logical item names to ZIP item names" of Part 2 “Open Packaging Conventions” at https://ecma-international.org/publications-and-standards/standards/ecma-376/ we need to

Remove the leading forward slash (“/”) from the logical item name or, in the case of interleaved parts, from each of the logical item names within the complete sequence.

I guess the part of interleaved parts is not applicable to xlsx (at least I don't remember such a thing) Would you like to create a PR stripping any leading "/" in https://github.com/qrilka/xlsx/blob/426eb0ad3aa6686cfdff309d87a2428df0011f83/src/Codec/Xlsx/Parser.hs#L707 and with a comment why we do this?

qrilka commented 6 months ago

Fixed in #172