javascriptdata / danfojs

Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
https://danfo.jsdata.org/
MIT License
4.8k stars 209 forks source link

Column empty and not being processed when top row's cell is empty #643

Open jaycoolslm opened 4 months ago

jaycoolslm commented 4 months ago

Ref https://github.com/javascriptdata/danfojs/issues/593

Describe the bug Column is not being pulled through into dataframe When uploading an excel which has the topmost row empty for a given column, that column will not come throw

To Reproduce Steps to reproduce the behavior:

  1. Create an excel which has headers and the top most row for column X is empty
  2. dfd.readExcel
  3. console.log(df.columns)
  4. See column missing

Expected behavior The column should be there

Desktop (please complete the following information):

jaycoolslm commented 4 months ago

This is likely to be an issue with https://docs.sheetjs.com/docs/api/parse-options/

jaycoolslm commented 4 months ago

This can be resolved on the sheetjs level using this solution

https://stackoverflow.com/a/66859139

Danfo team - I would suggest extending the current ExcelInputOptionsBrowser object to include an object that can be spread out as args when calling this function

utils.sheet_to_json(worksheet)

I will raise a PR for this if I have time

PS: my current working implementation:

// parse excel using XLSX
    const arrBuf = await file.arrayBuffer();
        const arrBufInt8 = new Uint8Array(arrBuf);
        const workbook = XLSX.read(arrBufInt8, { type: 'array' });
        const worksheet = workbook.Sheets[workbook.SheetNames[0]];
        const data = XLSX.utils.sheet_to_json(worksheet, { defval: '' });
        // instantiate new dataframe
    let df = new dfd.DataFrame(data)