openjusticeok / ojodb

OJO's R package for opening the black box of our justice system
https://openjusticeok.github.io/ojodb/
GNU General Public License v3.0
8 stars 3 forks source link

Add progress bar to all data pulls #93

Closed andrewjbe closed 1 year ago

andrewjbe commented 1 year ago

One really nice feature of things like tigris is that it'll tell you how your download is progressing, which is nice bc you can see when it hangs and stuff. We should add that to all data pulls from ojodb.

We could use the {progress} package: https://github.com/r-lib/progress

brancengregory commented 1 year ago

YES.

brancengregory commented 1 year ago

This shows how to include them for maps, too: https://github.com/r-lib/progress/issues/62

andrewjbe commented 1 year ago

Which function would this code actually go into, ojo_tbl?

andrewjbe commented 1 year ago

Did some work on this, I think our setup of using {dbplyr} might make this a little difficult. I also found out that {tigris} uses a progress bar built into the GET() function from {httr}. The {progress} package requires us to find some way of iteratively increasing the progress, e.g.

for(i in 1:100){
   do.a.thing()
   bar$tick(i)
}

That means I'm having trouble finding a way to get it to track the progress of collect(), and I'm having trouble figuring out a way to build this into the package so that it covers all downloads.

I think this part of the {progress} documentation might be relevant, but I'm too dumb to figure out how:

image

brancengregory commented 1 year ago

All that is spot on except the being dumb part lol

We probably can't get enough info from the data transfer happening with collect to do a direct progress indicator.

We could, though, use information from the query to provide /estimates/ rather than progress per se.

And with that we would be able to build in console output saying how long the query took if you are in an interactive session. That way you don't have to wrap every query in tic toc's, etc.

andrewjbe commented 1 year ago

I think both the loading bar idea and the query timer idea would only be possible if we restructure ojo_crim_cases() and ojo_civ_cases() so that collect() is happening under the hood. Like they would be as they are now, except instead of just returning the query to get collect()ed, they would actually:

1) tic() 2) execute query |> nrow() |> collect() to get the number of rows that will be returned, which you would then base the progress bar on, 3) Actually pull the data, with a progress bar. (although, idk if we'd be able to add a progress bar to collect() even with the number of rows though, because I'm not sure where we'd have it tick forward in the code.) 4) toc() 5) return the completed tibble