ropensci-review-tools / dashboard

Dashboard for rOpenSci software peer-review
https://dashboard.ropensci.org/
MIT License
3 stars 0 forks source link

Editor load graph should make it easy to see number of packages _assigned_ to the editor per quarter #8

Closed mpadge closed 5 months ago

mpadge commented 6 months ago

To reveal values on hover in https://ropensci-review-tools.github.io/dashboard/editors.html#past-ed-load

mpadge commented 6 months ago

@noamross This doesn't work using plotly. Those routines must have some kind of internal smoothing which turns the non-plotly graph: image into this: image

Tooltips then work, but the graph is garbage. Resolving that would require digging into the javascript translation, which I suggest is definitely not worth doing for such a minor enhancement. I suggest closing this issue, and simply improving explanatory text of the current, entirely static graph.

noamross commented 6 months ago

ggraph may be a quick way to do tooltops.

This at least requires better formatting so one can easily exactly what the counts are. (e.g., "How many packages did Maëlle handle last quarter?"). Two things that would probably help - regular rather than symmetric bar graphs, and better gridding.

mpadge commented 6 months ago

But ggraph is the network vis package? How does that do tooltips?

As for improving the form, I could try to adapt the data structure to work with ggridges. I'll see what can be done ...

noamross commented 6 months ago

Ah, I meant ggiraph

noamross commented 6 months ago

Here's an example of a plot I made some time ago when we were trying to get the load down to 6 packages/yr/editor (now it's four). It shouldn't be just like this, but it's the kind of granular info I was trying to convey:

screenshot_2018-04-01_12 01 47
noamross commented 6 months ago

One thing I'm not clear on is whether your graph is one "packages currently being handled during this period" or "packages assigned in this period." I think we want the latter for overall load handling. The former is more availability? Maybe an overall availability graph would help, too, e.g., "How many editors are free at a given time point," so we can see how often we're at capacity.

mpadge commented 6 months ago

This is a ggridge version, noting that that package does not (currently) include any equivalent way to generate actual bars - bar-type ridges are all hard-coded for density plots, which this is not. image Could easily replace with that?

It would be great to shade the fills so higher values were more intense. That is close to what the geom_ridgeline_gradient function does, but that is (1) experimental; and (2) only able to plot gradients "along the x-axis," whereas we need it the other way around.


On other issues, I'm not sure what you mean by the distinction between "packages currently being handled during this period" vs "packages assigned in this period"? Current code counts for each month all issues which were open at any time during that month and assigned to each editor. Those are then "currently being handled during this period" in my understanding, but please help clarify if i'm mis-interpreting that?

As for an availability graph, we could do that, and I definitely agree that it would be useful, but we'd require data we do not have, in the form of start and end dates for all historical editors. Data on dashboard at the moment are only generated for current editors, and exclude any who are no longer members of either editors or stats-editors teams.

noamross commented 6 months ago

You could use facets (facet_grid(editor ~ , ...)) rather than ggridges, which would also make it easily compatible with {ggiraph}

noamross commented 6 months ago

"packages assigned in this period" is from package issues that were opened in the period. (e.g., Adam typically handles 1 new issue/quarter). This is my definition of "load" and we aim for it to be 1 per quarter on average.

"packages currently being handled during this period" is from issues that were opened before or during the period and closed during or after the period. (e.g., In this quarter Adam had two reviews open). I'm not sure what I call this - it's like reverse "availability". Aggregate availability is number of editors at a time where this value is zero, and probably should not be summarized quarterly.

mpadge commented 6 months ago

@noamross Please let me know what you think of the newly updated part of editors page at https://ropensci-review-tools.github.io/dashboard/editors.html#past-ed-load. You'll see it's quite different to what it was. I've tried to focus much more on editorial load as you interpreted it above, and reduced the previous stuff to final "Individual Editor Load" section. You'll also see that the graphs there have been very reduced from previous iterations. I made this design decision because most of the "signal" in previous forms came from low values. That in turn encouraged, or risked encouraging, comparison between overall activity of different editors. I decided it was better to avoid that, and so have reduced the displayed data onto those which exceed the thresholds explained on the page. I think that way of presenting it largely prevents any comparisons between individual editors, which is a good thing. Everybody has their own personal reasons for varying degrees of commitment, and those should in no way be inadvertently revealed or compared here.

If you're happy with this, then hopefully we can close this issue in favour of more focussed ones from here on. Thanks for all the useful feedback!

noamross commented 6 months ago

I disagree. I think it's fine to show the individual editor values quarterly, and it's important for the EiC to make those comparisons so that they can distribute effort evenly. The current graphs don't show any useful current information for the past year or more . I don't think there's any issue with displaying the total individual packages handled. We can ask the editors in the channel, but if there's any issue with it pipe the results to AirTable and display them on the private interface. Also, we aim for four reviews per year per editor, not eight, so low values really are the signal. I don't believe we have a "how many concurrent" limit currently, but five is way too high.

The total editors plot doesn't match my intution, at least. I want this to be "total editors who can accept a package." There's no way that number is 13. Are you removing people who have left or are on leave? As an EiC, we tend to look first at whether editors have open issues, and then if everyone has an open issue, see who's has been pretty dormant. So total editor availability is the number of editors with no issues open, as I described above in https://github.com/ropensci-review-tools/dashboard/issues/8#issuecomment-2010074004 .

I still would like hover tooltips with values.

Limiting to the past two years might be a more helpful time frame and make values easier to read.

mpadge commented 6 months ago

No worries about disagreeing, and that all sounds good. I had the editors plot initally the way you said, so can easily revert. Numbers are then only 1-2 instead of 13 or so. And the "editorial load" charts can also be reverted in data structure while largely keeping current visual form. I'll try to reduce the time span, so the charts largely serve the purpose of indicating recent and current availability. I'll ping you again briefly tomorrow once that's done. Thanks as always for feedback

mpadge commented 6 months ago

@noamross Updated again, including new single chart of "estimated editorial workload" https://ropensci-review-tools.github.io/dashboard/editors.html#past-ed-load. Please let me know what you think. Once we're happy with the graphical outputs, we can improve the text. Feel free to PR any specific text suggestions to https://github.com/ropensci-review-tools/dashboard/blob/main/quarto/editors.qmd :+1:

noamross commented 6 months ago

Thanks. The individual plots look good. The numbers are still seem off in the aggregated graph. Note the individual graphs have a total of 13 and 14 editors, the aggregated graph shows that we have 16 editors. Anna shouldn't be in the aggregate, as she's on leave (I also just updated Karthik's status in the AirTable). I don't know who else it is including. I don't think we should include you, as we generally don't assign new packages to you.

Actually, numbers seem off in individual graphs. For instance, you only show two packages assigned in Q1 2024, but I count at least 4: https://github.com/ropensci/software-review/issues?q=is%3Aissue+created%3A%3E%3D2024-01-01+-no%3Aassignee+ . Please check your logic carefully.

I'm not sure about the utility of your workload graph. It's hard to determine what it means, and I think the multiplication is misleading and inflationary. I don't think the editor load is actually that unequal. For instance, one might just have a review that went nowhere over most of the year because of slow reviewers and authors, requiring very little editorial effort, but a couple of those multiplies everything else. Of course, this varies a lot, which is why I don't think it's a good measure. I find it much easier just to look at the raw numbers myself.

mpadge commented 6 months ago

@noamross Some responses to your points above:

The numbers are still seem off in the aggregated graph. Note the individual graphs have a total of 13 and 14 editors, the aggregated graph shows that we have 16 editors.

This actually seems okay. The aggregated shows up to 15, which is what we currently have, including editors who have yet to act as such and so don't appear in any of the individual graphs.

Anna shouldn't be in the aggregate, as she's on leave (I also just updated Karthik's status in the AirTable). I don't know who else it is including. I don't think we should include you, as we generally don't assign new packages to you.

All done, thanks.

Actually, numbers seem off in individual graphs. For instance, you only show two packages assigned in Q1 2024, but I count at least 4: ropensci/software-review/issues (created:>=2024-01-01 -no:assignee) . Please check your logic carefully.

Current results show expected 4 submissions for Q1 2024, so seems okay. Logic has been checked very carefully.

I'm not sure about the utility of your workload graph. It's hard to determine what it means, and I think the multiplication is misleading and inflationary. I don't think the editor load is actually that unequal. For instance, one might just have a review that went nowhere over most of the year because of slow reviewers and authors, requiring very little editorial effort, but a couple of those multiplies everything else. Of course, this varies a lot, which is why I don't think it's a good measure. I find it much easier just to look at the raw numbers myself.

Good point, and so I have deleted that entirely, and re-arranged the others to simply present number of new assignments per Q as the primary result.