jsonkenl / xlsxir

Xlsx parser for the Elixir language.
MIT License
212 stars 83 forks source link

Bad match on sheet r:id value "R1" #106

Open BrendonPierson opened 3 years ago

BrendonPierson commented 3 years ago

While trying to parse a workbook I received the following error:

** (MatchError) no match of right hand side value: "R1"
    (xlsxir 1.6.4) lib/xlsxir/parse_workbook.ex:27: anonymous fn/2 in Xlsxir.ParseWorkbook.sax_event_handler/2
    (elixir 1.11.4) lib/enum.ex:2193: Enum."-reduce/3-lists^foldl/2-0-"/3
    (xlsxir 1.6.4) lib/xlsxir/parse_workbook.ex:17: Xlsxir.ParseWorkbook.sax_event_handler/2
    (erlsom 1.5.0) /Users/bp/carebridge/deus_ex_machina/deps/erlsom/src/erlsom_sax_utf8.erl:1408: :erlsom_sax_utf8.wrapCallback/2
    (erlsom 1.5.0) /Users/bp/carebridge/deus_ex_machina/deps/erlsom/src/erlsom_sax_utf8.erl:926: :erlsom_sax_utf8.parseContentLT/2
    (erlsom 1.5.0) /Users/bp/carebridge/deus_ex_machina/deps/erlsom/src/erlsom_sax_utf8.erl:196: :erlsom_sax_utf8.parse/2
    (xlsxir 1.6.4) lib/xlsxir/sax_parser.ex:61: Xlsxir.SaxParser.parse/3
    (xlsxir 1.6.4) lib/xlsxir/xlsx_file.ex:236: Xlsxir.XlsxFile.parse_workbook_to_ets/1
    (xlsxir 1.6.4) lib/xlsxir.ex:233: Xlsxir.multi_extract/5

Digging through the actual xml I noticed the sheets attributer:id="R1"

<?xml version="1.0" encoding="utf-8"?>
<workbook xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
    xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
    <sheets>
        <sheet name="LATEST_XWALK" sheetId="1" r:id="R1" />
    </sheets>
</workbook>

Then it appears the parser is expecting the r:id to always start with "rId" Changing this function to accommodate "R1" did the trick

  def sax_event_handler({:startElement, _, 'sheet', _, xml_attrs}, state) do
    sheet =
      Enum.reduce(xml_attrs, %{name: nil, sheet_id: nil, rid: nil}, fn attr, sheet ->
        case attr do
          {:attribute, 'name', _, _, name} ->
            %{sheet | name: name |> to_string}

          {:attribute, 'sheetId', _, _, sheet_id} ->
            {sheet_id, _} = sheet_id |> to_string |> Integer.parse()
            %{sheet | sheet_id: sheet_id}

          {:attribute, 'id', _, _, rid} ->
            case rid |> to_string do
              "rId" <> rid ->
                {rid, _} = Integer.parse(rid)
                %{sheet | rid: rid}

              "R" <> rid ->
                {rid, _} = Integer.parse(rid)
                %{sheet | rid: rid}
              end

          _ ->
            sheet
        end
      end)

    %__MODULE__{state | sheets: [sheet | state.sheets]}
  end

I'm happy to clean up the above and submit a PR but I'm not 100% sure the r:id="R1" is valid? I received this file from a client, when creating a new file I see the expected r:id="rId1".