observablehq / stdlib

The Observable standard library.
https://observablehq.com/@observablehq/standard-library
ISC License
966 stars 83 forks source link

Issues parsing OpenXML SDK generated xlsx files #266

Open visnup opened 2 years ago

visnup commented 2 years ago

Upstream issue with ExcelJS: https://github.com/exceljs/exceljs/issues/1437 Originally reported to us by @aaronkyle.

I'll try opening a PR to fix in ExcelJS…

Gayathri-Senthilkumar commented 2 years ago

@visnup, I have run into the same issue in opening and parsing the .xlsx file and I would like to know the status of this open issue.

visnup commented 2 years ago

@Gayathri-Senthilkumar I stalled out when I ran into hurdles trying to come up with a minimal method to deal with the namespaces OpenXML uses. Would love help trying to fix it upstream, if someone has time.

Gayathri-Senthilkumar commented 2 years ago

@visnup, We would love to work with you on fixing this issue. It would be difficult for us right now to allocate dedicated bandwidth to work on this continuously, but we are planning to work and fix this issue with your guidance whenever we find time/additional bandwidth. Please share your input to resolve this issue.

Gayathri-Senthilkumar commented 2 years ago

@visnup Please let us know if there are any updates on this issue.

visnup commented 2 years ago

@Gayathri-Senthilkumar sadly I haven't been able to find time to work on this. As a workaround, you could use SheetJS to parse and access the file. It's a bit more verbose, but should be able to handle these files for now.

Gayathri-Senthilkumar commented 2 years ago

@visnup I can understand, but we are using ExcelJS Library in our application for parsing the .xlsx file and also performing a few other functionalities. Moving to a different library (SheetJS) would cause more effort as we need to explore the SheetJS library to check the feasibility of all the functions that we use in our application and then migrate. So we would like to know if this bug fix is in the backlog and any plan to fix it sometime later. If so could you please share a rough timeline for this fix.

JonasLukasczyk commented 1 year ago

My colleagues (@hlweil, @Freymaurer) and I ran into the same issue. Instead of switching to SheetJS we monkey-patched ExcelJS. Specifically, we overrode the parse function of the BaseXform class located in exceljs/lib/xlsx/xform/base-xform.js:

async parse(saxParser) {
  for await (const events of saxParser) {
    for (const {eventType, value} of events) {
      if(value.name && value.name.startsWith('x:')) value.name = value.name.slice(2);

      if (eventType === 'opentag') {
        this.parseOpen(value);
      } else if (eventType === 'text') {
        this.parseText(value);
      } else if (eventType === 'closetag') {
        if (!this.parseClose(value.name)) {
          return this.model;
        }
      }
    }
  }
  return this.model;
}

The new parse function is identical to the original except for the additional if statement at the beginning of the second for loop. This if statement checks if a tag starts with a global namespace (here x:) and just removes it before the parsing continues.