Open AlisterH opened 6 years ago
Hey,
Right, I think that's because indeed natsort does expect LF-ending lines, and only strips the LF at the end of each lines.
So when in fact given lines with CRLF what happens is that the lines are treated as e.g. not "foo" & "foo2" but "foo\r" and "foo2\r" -- causing the results you see.
I guess you could patch natsort to also remove any trailing \r, though it should be noted that your results would be "converted" to LF line endings. (Or you'd need a flag/option for CRLF lien endings.)
Might be better/simpler though to simply convert your files to LF line endings before sorting.
Cheers,
Hi. Yes, it is easy enough to convert your files as long as you know that it is necessary. After discussing this elsewhere and looking more closely at the behaviour of sort and other standard *nix tools, I believe that the current behaviour is "correct" because it is consistent with other standard tools. I guess the issue is that at least the python implementation of natsort (which also provides a command line tool called natsort) strips all trailing whitespace before sorting, so is in this respect incompatible. I think it would be worth adding a note to the documentation something like this:
This implementation of natsort expects LF line endings and will produce unexpected results if operating on "Windows" format files with CRLF line endings. This differs from other implementations of natsort, which strip all trailing whitespace before sorting.
People could still get into trouble by inadvertently switching to using your implementation, but at least that is less likely if it is documented.
Hi, I realise you're not actively developing natsort, but I figure it is at least worth filing this for the record:
Natsort certainly gives better results than coreutils sort -V, or anything I can get out of msort. But it seems that it makes some mistakes when the input is from a file with Windows line endings (i.e. CRLF) - see below.
Or perhaps there is something obvious that I'm missing (I guess something about what cat does)? It seems that coreutils sort -V also gives worse results when operating on CRLF files. The Python natsort doesn't seem to.