javascriptdata / danfojs

Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
https://danfo.jsdata.org/
MIT License
4.79k stars 209 forks source link

Values mapped to incorrect columns #556

Open febilyt opened 1 year ago

febilyt commented 1 year ago

Describe the bug When creating a dataframe from an object array, if the keys of different objects are not in the same order, the values will be mapped to an incorrect column.

To Reproduce Sample code

const dfd = require("danfojs-node");

let data = [
  {
      Id: 1,
      Name: 'Apple'
  },
  {
      Name: 'Orange',
      Id: 2
  },
];
let df = new dfd.DataFrame(data);
df.print();

Output:

╔════════════╤═══════════════════╤═══════════════════╗
║            │ Id                │ Name              ║
╟────────────┼───────────────────┼───────────────────╢
║ 0          │ 1                 │ Apple             ║
╟────────────┼───────────────────┼───────────────────╢
║ 1          │ Orange            │ 2                 ║
╚════════════╧═══════════════════╧═══════════════════╝

Expected behavior Value should be mapped to correct column depending on the object key.

S-L-Moore commented 1 year ago

Nb. If fields are not defined in the first object

  let data = [
    { Id: 1, Name: "Apple" },
    { Name: "Orange", Id: 2 },
    { Name: "Grape", Id: 3, Type: "Red" },
  ];

  let df = new dfd.DataFrame(data);
  console.log(dfd.toJSON(df));

They will be dropped:

[
    { Id: 1, Name: "Apple" },
    { Id: "Orange", Name: 2 },
    { Id: "Grape", Name: 3 },
  ];

Including all fields in the first object

  let data = [
    { Id: 1, Name: "Apple", Type: "Green" },
    { Name: "Orange", Id: 2 },
    { Name: "Grape", Id: 3, Type: "Red" },
  ];

  let df = new dfd.DataFrame(data);
  console.log(dfd.toJSON(df));

Does result in all keys being in the output data (but still incorrectly mapped):

  [
    { Id: 1, Name: "Apple", Type, "Green" },
    { Id: "Orange", Name: 2, Type, undefined },
    { Id: "Grape", Name: 3, Type: "Red" },
  ];

The above both fail with .print() called on the df "Table must have a consistent number of cells."

erasromani commented 1 year ago

I am facing the same issue as @febilyt. Are there any intermediate solutions to this other than fixing the ordering of the data array up front?

S-L-Moore commented 1 year ago

I wondered if a new DataFrame could be created & then have rows & columns dynamically added but I ran into a few issues with addColumn/append:

  //let dft = new dfd.DataFrame();
  //dft.addColumn("ID", [0]);     // Error: column length mismatch
  //dft.append([[0, 0, 0]], [0]); // Error: values must match #columns
  //dft.append([[1, 2, 3]], [0], { inplace: true });
  //                              // Fails if the row doesn't exist (~overwrite)

Cobbled together a quick solution based on farming the unique keys:

  // Get the unique column names from the data
  let column_names: Set<string> = new Set();
  data.forEach((o) => {
    column_names = new Set([...column_names, ...Object.keys(o)]);
  });

  // Initialize empty DataFrame
  let df = new dfd.DataFrame([[...column_names].map((x) => undefined)], {
    columns: [...column_names],
  }).drop({ index: [0] });

  // Append each row
  data.forEach((o, i) => {
    df = df.append([[...column_names].map((name) => o[name])], [i]);
  });

I've not used danfojs yet so I'm not quite sure how well the undefined will be handled however the above does get the data loaded as expected:

0 : {Id: 1, Name: 'Apple', Type: 'Green'}
1 : {Id: 2, Name: 'Orange', Type: undefined}
2 : {Id: 3, Name: 'Grape', Type: 'Red'}