skarademir / naturalsort

A simple natural sorter for Go Strings
MIT License
28 stars 2 forks source link

2 more cases which don't seem to sort correctly #10

Open johnrs opened 9 years ago

johnrs commented 9 years ago

[]string{"1", "#1", "_1", "a"} ==> [#1 1 _1 a]

[]string{"111111111111111111112", "111111111111111111113", "1111111111111111111120"} ==> [111111111111111111112 1111111111111111111120 111111111111111111113]

Notes: You can replace regexp.MustCompilePOSIX with regexp.MustCompile. From what I understand, not many use the POSIX version.

You can replace: splitshortest := len(spliti) if len(spliti) > len(splitj) { splitshortest = len(splitj) } for index := 0; index < splitshortest; index++ { WIth: for index := 0; index < len(spliti) && index < len(splitj); index ++ {

You can replace: if ei == nil && ej == nil { //if number With: // Handle numbers case.
if ei == nil { // Only need to test ei since ej is the same

Before: return spliti[index] < splitj[index] I added: // Handle non-numbers case

You can replace the code: return s[i] < s[j] With: return len(s[i]) < len(s[j])

One of these changes solve one of the problems above.

skarademir commented 9 years ago

I think the tagged change solves the problem(s). However, replacing

return s[i] < s[j]

With:

return len(s[i]) < len(s[j])

I'm not sure in what cases that code is hit and how that code change would affect the result

johnrs commented 9 years ago

I think the tagged change solves the problem(s).

I am seeing just one problem. [a1 a#1 a_1 aa] ==> [a1 a#1 a_1 aa] // Correct But [1 #1 _1 a] ==> [#1 1 _1 a] // Wrong The second case is the same as the first, minus the leading "a".

John

John Souvestre - New Orleans LA 
skarademir commented 9 years ago

Ah oops i think i punched that test case in incorrectly. Reopening to track

skarademir commented 9 years ago

Alright. Reviewed the problem. I now set all non numeric symbols to be greater than their numeric friends.

johnrs commented 9 years ago

Unfortunately, it seems that the fix had a side effect.

["1", " ", "0"] è [1 0] - The space is in the middle.

The problem is sensitive to the input order. It fails for space, but other characters seem to work.

John

John Souvestre - New Orleans LA

From: Saruhan Karademir [mailto:notifications@github.com] Sent: 2015 April 25, Sat 18:58 To: skarademir/naturalsort Cc: JohnRS Subject: Re: [naturalsort] 2 more cases which don't seem to sort correctly (#10)

Alright. Reviewed the problem. I now set all non numeric symbols to be greater than their numeric friends.

— Reply to this email directly or view it on GitHub https://github.com/skarademir/naturalsort/issues/10#issuecomment-96296413 . https://github.com/notifications/beacon/AFaoatfEw4Ax75xZo2h6YGxYflqw5TTwks5oDCGEgaJpZM4EDVQ3.gif

skarademir commented 9 years ago

Solved that last problem you found as well. Looks like the bottom-most equality clause was being hit when a string of only space characters was being compared. This meant I had to remove the len(left) < len(right) optimization you had suggested earlier.

Thanks again!

johnrs commented 9 years ago

I think that I'm still seeing a problem with a space-only string. It seems to sort first. For example:

["1", " ", "#"] results in [ 1 #] rather than [1 #]

John

skarademir commented 9 years ago

If I understood you correctly, you want the space character to be handled like all other non numerical characters. However, since we are explicitly filtering This character out, it takes a different precedence. This behavior matches MacOS X Finder sorting.

I think the change would not be too hard, but I'm reluctant to break away from established norms. Do you have any examples of the space character being deffered in other natural sorting implementations.

johnrs commented 9 years ago

It seems that the space character is sorted ahead of the numbers, but all of the other non-numeric characters are sorted after the numbers. This seems illogical to me. I believe that the space character should be treated like all of the other non-numeric characters. Also, please note that currently a "space" sorts before a number, but a "space, letter" sorts after a number

I don't know of any references for natural sorting. As an example, vbom.ml/util/sortorder sorts the way I describe.

Here is a sample which shows a few variations on the theme. Input: [" 0", "1", "2", " ", " b", "#", "", "a"] Result: [ 0 1 2 # a b] aka [20 2030 31 32 23 5F 61 2062] I Suggest: [1 2 0 b # _ a] aka [31 32 20 2030 2062 23 5F 61]

skarademir commented 9 years ago

You got me with

Also, please note that currently a "space" sorts before a number, but a "space, letter" sorts after a number

Reopening