vlang / v

Simple, fast, safe, compiled language for developing maintainable software. Compiles itself in <1s with zero library dependencies. Supports automatic C => V translation. https://vlang.io
MIT License
35.76k stars 2.16k forks source link

Stable `sort` algorithm support #10742

Closed flysand7 closed 3 years ago

flysand7 commented 3 years ago

Currently array.sort() is unstable, meaning it is unusable for performing this kind of task:

Sort array of users by age, and within the records with same age, sort alphabetically by username

A code that would perform this would have been:

users.sort(a.name < b.name)    // Not sure if V supports string comparison, left for the sake of example
users.sort(a.age < b.age)

However you can observe that if the second sort changes the relative positions of elements with the same age (i.e. the sorting algorithm is unstable), then the task would not be completed using this algorithm.

Current sort behaviour

I'm trying to sort an array like this:

struct User {
  age  int
  name string
}

mut users := [User{21, 'Bob'}, User{21, 'Jon Blow'}, User{20, 'Zarkon'}, User{25, 'Casey Muratori'}, User{25, 'Alice'}]
users.sort(a.age < b.age)

And the command outputs users with similar age in the wrong order:

[User{
    age: 20
    name: 'Zarkon'
}, User{
    age: 21
    name: 'Jon Blow'
}, User{
    age: 21
    name: 'Bob'
}, User{
    age: 25
    name: 'Alice'
}, User{
    age: 25
    name: 'Casey Muratori'
}]
mvlootman commented 3 years ago

Perhaps we can have sort(a.age>b.age && a.name>b.name) or being able to provide a function to write how the sorting should take place?

flysand7 commented 3 years ago

This is wrong on several levels. First, a.age>b.age && a.name>b.name would swap two elements if they are sorted by age and not by name OR they are sorted by name and not by age. This is wrong because if I want to sort alphabetically and within same entries by age, then I get the following situation:

Casey Muratori, 29
Jon Blow, 29
Jon Blow, 40

In this example Casey Muratori, 29 and Jon Blow, 29 are sorted by name, but not by age, implying that the elements should be swapped -- but they shouldn't since this array is already sorted.

Second point, is that if you're proposing this as a syntax for sort function, then I wouldn't hold you back but consider what happens when people want to use this non-trivial syntax for other purposes. I'm not sure whether this is a good idea.

I propose an alternative solution, which is more flexible. Make sort take in anonymous templatised functions of a and b. In this case it is possible to implement what I said as well as many other non-trivial sorting functions (ex.: alphanumeric sorting of strings (example implementation and details: https://gist.github.com/bumbread/5b0d06171fd7a9cf42f68379b4dd4c97)).

The solution will look like this:

arr.sort(fn(a,b: User)->bool {
   // The users are definately sorted if names are
   if(a.name != b.name) {
     return a.name < b.name;
   }

   // If the names equal they are sorted if the ages are
  return a.age < b.age;
});
mvlootman commented 3 years ago

You are right, that syntax (&&) would never work. Being able to provide a custom sorting function would be an elegant solution.

lucasrdrgs commented 3 years ago

There is already a function to sort arrays based on a custom comparator, seen in vlib/builtin/array.v. Here is an example:

struct User {
pub mut:
        age int
        typ int
}

fn main() {
        mut arr := [
                User{10, 1},
                User{9, 1},
                User{83, 0},
                User{99, 0}
        ]

        comparator := fn (a &User, b &User) int {
                cond := a.age >= b.age && a.typ <= b.typ
                if cond == true { return -1 }
                else { return 1 }
        }

        arr.sort_with_compare(comparator)

        println(arr)
}

The comparator I wrote sorts user by age, type. I wanted the lower types to show up first (so when you run this program, users with type 0 will show up first) and then by age, so that the final order is highest age/lowest type.

Here is the output:

[User{
    age: 99
    typ: 0
}, User{
    age: 83
    typ: 0
}, User{
    age: 10
    typ: 1
}, User{
    age: 9
    typ: 1
}]

Edit: I'm not sure what sort_with_compare expects its comparator to return, apparently -1 when a comes before b and 1 when b comes before a.

Also, make sure the args to your comparator function are references.

lucasrdrgs commented 3 years ago

Found something useful. V uses C's stdlib qsort to perform sorting using this custom comparator function from my last comment. Here's some piece of documentation: https://www.cplusplus.com/reference/cstdlib/qsort/

It says: Return value Meaning
<0 a comes before b
0 a and b are the same (do not swap them)
>0 b comes before a
mvlootman commented 3 years ago

I missed the sort_with_compare function, that was exactly what I was looking for. I will check the docs to see if I missed it, otherwise will update it to make it easier to find.

dumblob commented 3 years ago

A little bit off-topic: I actually think that we could still benefit from a stable sorting algorithm if it was the default (and at the same time get one C lib dependency less).

E.g. https://github.com/scandum/quadsort having the same measured performance (or slightly better) as qsort in all potentially problematic cases. It has identical algorithmic properties as quicksort with the only difference that it's stable.

There is also a fixed-size-memory variant https://github.com/scandum/blitsort but it gets slightly slower for large inputs (beginning with tens of thousands of items with default configuration) as a trade off for fixed size memory (quicksort doesn't use fixed size memory but more).

Related existing issue: https://github.com/vlang/v/issues/5201

igotfr commented 3 years ago

A little bit off-topic: I actually think that we could still benefit from a stable sorting algorithm if it was the default (and at the same time get one C lib dependency less).

E.g. https://github.com/scandum/quadsort having the same measured performance (or slightly better) as qsort in all potentially problematic cases. It has identical algorithmic properties as quicksort with the only difference that it's stable.

There is also a fixed-size-memory variant https://github.com/scandum/blitsort but it gets slightly slower for large inputs (beginning with tens of thousands of items with default configuration) as a trade off for fixed size memory (quicksort doesn't use fixed size memory but more).

Related existing issue: #5201

@dumblob it's better to let the user choose which algorithm to use:

array.sort(a > b, QUICKSORT)
array.sort(a > b, MERGESORT)

Julia does it: https://docs.julialang.org/en/v1/base/sort/

dumblob commented 3 years ago

@dumblob it's better to let the user choose which algorithm to use:

Definitely! Though I'm not concerned with "choice" but with default behavior (I don't think I'd want to write array.sort( a > b, QUICKSORT ) instead of array.sort( a > b )).