javascriptdata / danfojs

Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
https://danfo.jsdata.org/
MIT License
4.8k stars 210 forks source link

DataFrame.sort_values() should support sorting by multiple columns #298

Open tbui-isgn opened 3 years ago

tbui-isgn commented 3 years ago

Background of this feature request Currently, according to Danfo documentation:

{ by: This key can be either a single column name or a single array of the same length as the calling DataFrame, ascending: Order of sorting inplace: Boolean indicating whether to perform the operation inplace or not. Defaults to false }

Going by the documentation, I understand the 2 use cases supported being sorting using values in a single column (specified by a single column name) or a kind of manual sort where you have to compute an index array yourself (or even manual input of an array, which in practical terms I don't think many people would choose to go down this route), which tells the DF the specific order that the rows should be placed in.

This is not ideal when you have a need to sort the data easily using two or more columns.

Request details I'm hoping that there are plans in the work that would support a similar kind of sorting pattern as what is in Pandas, where you can specify an array of column names (in the "by" key) whose values would be used in the computation of the sorting index and a Boolean array in the "ascending" key to control the direction for each of those column.

The syntax may look something like:

{
 by:  ["Column1", "Column2"]
 ascending:  [true, false]
 inplace: false
}

Considerations I understand that if implemented it may require some further checking logics in processing the "by" key because then it can be an index array (manual sort) or a String array (multi columns sort).

Maybe the cleanest solution in that case is moving the logics that handle the specification of a manual index array to another key entirely.

This proposal mean the config object for this method may look like this:

{ by: This key can be either a single column name or an array column names to be sorted, index: This key expects a manual index array of the same length as the calling DataFrame. [ IMPORTANT ] When this option is in use, the "by" key has no effect ascending: Order of sorting, can be Boolean or a Boolean array inplace: Boolean indicating whether to perform the operation inplace or not. Defaults to false }

risenW commented 3 years ago

@tbui-isgn Thanks for the suggestion, we can definitely support this.

tbui-isgn commented 3 years ago

@tbui-isgn Thanks for the suggestion, we can definitely support this.

Thank you :) really appreciate your consideration

fonty422 commented 2 years ago

Hi, just wondering if there has been any movement on this and whether the functionality exists?

Midas-Li commented 1 year ago

muitl columns sort like this TS example?

let a = [
  ['a01', 'English', 100],
  ['a01', 'math', 99],
  ['a02', 'English', 80],
  ['a02', 'math', 88],
  ['a03', 'Physics', 99],
  ['a04', 'Physics', 55],
  ['a08', 'English', 77],
  ['a09', 'math', 85],
];
let sortIndex = [1, 2];  //sort column index

let newArr=a.sort((a: (string | number)[], b: (string | number)[]) => {
  let rst = 0;
  sortIndex.some((value) => {
    if (a[value] > b[value]) {
      rst = 1;
      return true;
    } else if (a[value] < b[value]) {
      rst = -1;
      return true;
    } else {
      rst = 0;
      return false;
    }
  });
  return rst;
});

console.log(newArr);

output: [ [ 'a08', 'English', 77 ], [ 'a02', 'English', 80 ], [ 'a01', 'English', 100 ], [ 'a04', 'Physics', 55 ], [ 'a03', 'Physics', 99 ], [ 'a09', 'math', 85 ], [ 'a02', 'math', 88 ], [ 'a01', 'math', 99 ] ]