Closed anderser closed 8 years ago
Hi Anders! This is a use-case I wrestled with when I was designing the computations & aggregations. This issue also comes up with things like running averages. The way you've done it is the way I intended for it to be done, though I'll admit it produces somewhat ugly code.
The alternative is to take analysis outside of a Computation and just iterate over the rows and produce new data using ordinary Python methods. Although this is code won't look like your other agate code, I always intended it to be possible to just treat an agate table as a normal Python object. (Via table.columns
and table.rows
.)
If you have suggestions for alternate ways of supporting this kind of operation I'd love to hear them!
Flagging this as a documentation issue. Running average would make a good example for the cookbook.
I call it guerilla code. It isn't pretty, but it does the job :) Some docs on how to do things like this would be great, yes. What I struggled most with was that each row doesn't (as far as I understood) have a unique key that can be used in the new dict in order to wire all up correctly in the run method. But maybe I just made this more complicated by making a Computation as you wrote. Pure Python might be easier to do (and to read/remember later)
Ah ha, I see. I had totally overlooked the problem before. Okay, let me think about this more. There might be a solution involving providing the row index or name to the Computation.run
...
Huh, yeah, this doesn't really work at all, does it? Elevating the priority of this. Amazing it took so long for this case to come up!
Well it worked for me the way I did it, but maybe that was pure luck. It never felt good messing around with that key_column_name
to get a "unique" key to add the streak value to anyway. I'll dig more into the Agate code and try to think of something if I understand what is going on there
Yeah, the way you did it is a good hack, but certainly not an ideal solution. There really shouldn't be a requirement that the table has a unique ID at all in order to compute something like this.
@anderser I've just pushed a set of changes to master that modify the implementation of Computation
to better support these cases. This is a backwards-incompatible change, so I'm approaching it cautiously, but I feel pretty confident it's the right solution. Mind taking a look?
Here is an example of how the new interface can solve your problem: http://agate.readthedocs.org/en/latest/cookbook/compute.html#simple-moving-average
I'm satisfied that this working and I'm noticing some very pleasant symmetries with other parts of agate, so I'm calling this finished. If anybody spots any remaining gaps in the implementation, please reopen!
@onyxfish This is just great! Your changes using table in run method greatly simplified my Computation (and probably enabled more others). Thanks for the swift response to my messy code/proposal.
For the record: here is the adjusted Streaks Computation: https://gist.github.com/anderser/12d32a25f385f8a7f6d1
Hurray! That looks great! Do you mind if I use that code as an example?
Of course you may. Please do. And change if needed.
This is a feature which might fit more into agate-stats or another extension. If so, shout out and I'll move the issue.
I need to find streaks of similar values in a dataset. This method is often used in sports (finding consecutive wins/losses and determine the period with most wins/losses in a row). Another application might be finding the period/date range with days of rain.
To me this is a Computation, but I am having trouble understanding how (or if) computation can reference the previous row.
Before submitting a PR, I'll just try to explain my method:
The calculation takes two parameters: the column which is your key/unique id. Maybe a timestamp, date. The other param is the column you want to find streaks of values in.
The
prepare
method then loops through the rows, compares the value to the previous row and if there is a new value, increases the streak value (an integer). The resulting streak value for each row is saved in a dict with value of key column as key.The
run
methods then retrieves the corresponding streak value from the dict for the current row.A working, but maybe not "to Agate standards" code here: https://gist.github.com/anderser/12d32a25f385f8a7f6d1
Sample table
Computing the streaks
Group the data by streaks:
Finding the start/end date of longest streaks
Which would give you something like this: