Open shanemgrey opened 8 years ago
Thanks Shane. Could you give a very small example of how you'd like this to work?
On Thu, Apr 28, 2016 at 6:16 PM, Shane Grey notifications@github.com wrote:
Sort allows more than one key, but it seems that all which are defined will be sorted ascending or descending, depending on the value of reverse, which can only accept a single value.
There is sometimes a need to sort the values in different orders.
I looked at the code to unravel how the sort functions flow, but I wasn't able to quickly get my head around it well enough to make reverse accept a list like key does.
Would be happy to work on it more and submit a PR if this is something that's not already there, me missing it. But I'm probably not the best person for it is it's non-trivial.
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/alimanfoo/petl/issues/375
Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com alimanfoo@gmail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721
Let's say I want to find instances where we sold our best selling products at lowest margins.
I request a report of items sorted by highest number sold, descending, then by margin each has been sold at, ascending.
import petl as etl
table_purchases = [['customer', 'itemid', 'total_number_sold', 'margin_on_this_sale'],
['John', 2, 27, 4],
['John', 2, 27, 4],
['John', 1, 45, 4],
['Sally', 1, 45, 1],
['Sally', 1, 45, 1],
['John', 1, 45, 4],
['Sally', 2, 27, 1],
['Marie', 3, 92, 8],
['John', 3, 92, 2],
['Sally', 3, 92, 1]]
table_sorted = etl.sort(table_purchases, key=['total_number_sold', 'margin_on_this_sale'], reverse=True)
table_sorted.look(limit=10)
customer | itemid | total_number_sold | margin_on_this_sale |
---|---|---|---|
'Marie' | 3 | 92 | 8 |
'John' | 3 | 92 | 2 |
'Sally' | 3 | 92 | 1 |
'John' | 1 | 45 | 4 |
'John' | 1 | 45 | 4 |
'Sally' | 1 | 45 | 1 |
'Sally' | 1 | 45 | 1 |
'John' | 2 | 27 | 4 |
'John' | 2 | 27 | 4 |
'Sally' | 2 | 27 | 1 |
I want to see that Sally is where we make the least money for the work. Marie doesn't buy often, I don't want to see her record first.
Granted, I could just read the report differently, but sometimes report reading people want reports the way they want them...
And I can't think of a way to get the sorting right without putting it in a spreadsheet and doing it manually.
The problem I was initially having was sorting by two columns to get a descending date and keep only the most recent dated record for any unique itemid.
But I couldn't find a reason reverse sorting both won't work...
So I tried (hard) to contrive an example and not look stupid for having requested something useless. And this is what I came up with.
Maybe you can think of a more useful use case?
I also found examples of others looking for this functionality in MySQL, and they just didn't know the syntax.
http://stackoverflow.com/questions/13694907/mysql-order-by-multiple-column-asc-and-desc?rq=1 http://stackoverflow.com/questions/15313043/mysql-query-to-order-two-column-one-asc-another-desc
Ultimately, I responded to your quick request for an example because I didn't want to make a request and then leave you unanswered. If this were my project, I would consider the feature request novel, but not important enough to work on. But that's not always easy to tell someone. If that's how you feel about it, you can tell me! :-)
Hi Shane, thanks for the example, I just wanted to check I knew exactly what you meant. I think this is sometimes called "complex sort". Because sorting is stable you can implement this by chaining together sort() calls. E.g.::
>>> import petl as etl
>>> table_purchases = [['customer', 'itemid', 'total_number_sold', 'margin_on_this_sale'],
... ['John', 2, 27, 4],
... ['John', 2, 27, 4],
... ['John', 1, 45, 4],
... ['Sally', 1, 45, 1],
... ['Sally', 1, 45, 1],
... ['John', 1, 45, 4],
... ['Sally', 2, 27, 1],
... ['Marie', 3, 92, 8],
... ['John', 3, 92, 2],
... ['Sally', 3, 92, 1]]
>>> table_sorted = etl.sort(table_purchases, key='margin_on_this_sale').sort(key='total_number_sold', reverse=True)
>>> table_sorted.lookall()
+----------+--------+-------------------+---------------------+
| customer | itemid | total_number_sold | margin_on_this_sale |
+==========+========+===================+=====================+
| 'Sally' | 3 | 92 | 1 |
+----------+--------+-------------------+---------------------+
| 'John' | 3 | 92 | 2 |
+----------+--------+-------------------+---------------------+
| 'Marie' | 3 | 92 | 8 |
+----------+--------+-------------------+---------------------+
| 'Sally' | 1 | 45 | 1 |
+----------+--------+-------------------+---------------------+
| 'Sally' | 1 | 45 | 1 |
+----------+--------+-------------------+---------------------+
| 'John' | 1 | 45 | 4 |
+----------+--------+-------------------+---------------------+
| 'John' | 1 | 45 | 4 |
+----------+--------+-------------------+---------------------+
| 'Sally' | 2 | 27 | 1 |
+----------+--------+-------------------+---------------------+
| 'John' | 2 | 27 | 4 |
+----------+--------+-------------------+---------------------+
| 'John' | 2 | 27 | 4 |
+----------+--------+-------------------+---------------------+
The trick is to order the sort() operations back-to-front, i.e., the first sort() has the last sorting condition.
A while ago I thought it might be useful to add a convenience function that would handle this for you, there was an issue for it (#25) but I closed it because no-one asked for it and I didn't need it often. Happy to revisit tho if you think a convenience function would be useful.
That's great that the functionality is easy to build. Thanks for the example!
I'd say rather than add a convenience function, maybe just add a note on this chaining of sorting in the docs. Seems like it might be a more universal method, and keeps cruft out of the code.
I personally don't need the functionality as I thought I did.
Good suggestion, will do.
On Thursday, 28 April 2016, Shane Grey notifications@github.com wrote:
That's great that the functionality is easy to build. Thanks for the example!
I'd say rather than add a convenience function, maybe just add a note on this chaining of sorting in the docs. Seems like it might be a more universal method, and keeps cruft out of the code.
I personally don't need the functionality as I thought I did.
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/alimanfoo/petl/issues/375#issuecomment-215575263
Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com alimanfoo@gmail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721
Re-opening as a reminder to modify the sort() docstring with an example.
Sort allows more than one key, but it seems that all which are defined will be sorted ascending or descending, depending on the value of reverse, which can only accept a single value.
There is sometimes a need to sort the values in different orders.
I looked at the code to unravel how the sort functions flow, but I wasn't able to quickly get my head around it well enough to make reverse accept a list like key does.
Would be happy to work on it more and submit a PR if this is something that's not already there, me missing it. But I'm probably not the best person for it is it's non-trivial.