ropensci / roregistry

ropensci registry
13 stars 5 forks source link

registry should indicate onboarding status #9

Closed cboettig closed 5 years ago

cboettig commented 7 years ago

it would be great to be able to check onboarding status info in the registry. There's not a really obvious term for this in codemeta terms, there is developmentStatus but that's described as being a https://repostatus.org term.

We could just add it as an additional, ropensci specific property, e.g.

"ropensci:onboarding": "In Review"
sckott commented 7 years ago

What is the set of accepted values for that key?

cboettig commented 7 years ago

codemeta isn't that normative, most fields are only constrained by type (e.g. URL vs text string), nodes of type id or url can also take other nodes as arguments. (In JSON-LD it is possible to create restricted vocabularies for properties, but not super common).

onboarding isn't a recognized property in codemeta terms, and probably doesn't appear in any other common namespace either (e.g. it's not in schema.org or dublin core etc), so the natural thing to do is just declare that it's an "ropensci" term. JSON-LD is built to be easily extensible in this way, you would just add such terms explicitly to the "context", see example

Does that make sense?

sckott commented 7 years ago

Makes sense yes. I was just asking what you think the values are? e.g., in review, accepted ? others?

cboettig commented 7 years ago

ah right, I thinking we'd scrape the terms directly from the text on the badge, though it's not actually clear to me where that text is coming from.

sckott commented 7 years ago

me neither - @karthik what are the possible values for the badge?

right now, we don't have a way to automate retrieval of this info into the registry.json file, but could do manually for now- things don't change that quickly in onboarding :) but of course automated is better

cboettig commented 7 years ago

@sckott Well I guess we can always scrape the text from the svg source of the badge and use that, at least for the time being. (Looks like the term is Under Review not in review). I can add a little function to codemetar for extracting badge status metadata.

karthik commented 7 years ago

me neither - @karthik what are the possible values for the badge?

Right now:

unknown - Anything not yet under onboarding, or still under presubmission inquiry. under review - Anything from 1/editor check till 5/ peer reviewed - Anything marked 6/approved

sckott commented 7 years ago

So for a pkg that is likely to never go through onboarding, use unknown? or something else?

karthik commented 7 years ago

We need to have a discussion about that. For packages that have been developed by ropensci staff before onboarding, or have accepted as is, we should have a badge for that separate from unknown. Maybe they can be considered reviewed?

cboettig commented 7 years ago

Good point. I don't think we should badge things as reviewed that haven't been through our onboarding peer review (e.g. bound to create confusion). Some packages developed before onboarding have subsequently gone through it (EML, fishbase -- speaking of which, how do I get onboarding badges for them? and can onboarding badges link to the onboarding review?), and others might. But possibly not all.

I'd suggest we badge them as pre-review or something similar.

@sckott What would be the reasons for a package to never go through onboarding? I know we can't just swamp the review process with 100 staff-written packages, but seems there was some discussion of lulls and in onboarding and new reviewers asking why they hadn't been asked to review anything? Maybe we could slowly work those through. @noamross thoughts?

Maybe there are certain packages that aren't suitable for onboarding review? (e.g. I dunno, but I think of some of Jeroen's packages which are clearly professionally developed but I'd have a hard time constructively reviewing them myself when most of the work is about crazy system Makevar issues... maybe we have a category for staff reviewed in that case? Or just onboard them anyway?

noamross commented 7 years ago

The review badges are automated and are tied to the review issue labels. They get "under review" at stage 2/seeking-reviewers and "peer-reviewed" at 6/approved. We went though and labeled the old reviews retroactively a while back so badge status should be for most (I note EML remains at 4/review-in-awaiting-changes), except the few like fishbase that preceeded the onboarding repo.

It would be good if the peer review metadata includes the URL of the reviews. In our case, the onboarding thread, but maybe we want to establish a convention that others could use, too. Like peer_reviewed: yes; review_org: ropensci; review_url: the_url.

I really don't think that we should have a "staff reviewed tag" for things that haven't gone through our process. "Developed by rOpenSci" and "rOpenSci Peer Reviewed" are two separate and complementary marks of quality but mean very different thngs. It's OK that we host both as long as its clear which is each or both. There are lots of packages out there that we would recommend even though they are not peer reviewed - they've gone though other validating processes like CRAN, or have trusted developers. It's fine that some of our internal work rests on those other validations (especially as some of it would actually fall out of our peer review scope). Maybe we want to put RO as an author in DESCRIPTION to clarify which are RO-developed packages.

For those packages that we develop that are in scope, I do think they would benefit from peer review and we could absorb them bit by bit. We do have people who are C/Makevars experts in our reviewer pool who I try to reserve for packages that benefit from it, just as we tap people with knowledge of S4, etc., for particular packages.

cboettig commented 7 years ago

Thanks Noam. So does this mean there's some database I can query to get onboarding status instead of me scraping the text from the raw SVGs?

karthik commented 7 years ago

pre-review could be a nice internal tag but that sounds even more confusing if displayed on a repo. While it would be nice to send older stuff through as time permits, many packages have been well used and have already been vetted (like rgbif and such).

I love Noam's idea of creating quick issues for such packages and just accepting them via editorial review and adding 6/approved + legacy tags to them.

, how do I get onboarding badges for them? and can onboarding badges link to the onboarding review?

That is exactly what they do now. If you have an issue number, the badge is at ropensci.org/badges/issuenumber_status.svg

sckott commented 7 years ago

What would be the reasons for a package to never go through onboarding?

I don't think anyone is against it - it's just a matter of very large volume. So for in house pkgs, that'd be like 90 pkgs that would need to go through review that are on CRAN + those on github and not on cran yet

I know we can't just swamp the review process with 100 staff-written packages, but seems there was some discussion of lulls and in onboarding and new reviewers asking why they hadn't been asked to review anything?

yeah, there have been lulls - we could submit ours when there's down time

noamross commented 7 years ago

I note I only think we should create those legacy issues for previous packages if they went through peer review similar to our current process. fishbase, for instance, was peer-reviewed but the reviews reside in the package repo.

karthik commented 7 years ago

I see. Perhaps we can also do something else to get all the older ones in via a fast track process?

Also it might be worth thinking more about review for things submitted to us versus what we do as our jobs. Stuff written by Scott, Jeroen, and our contractors like Kirill are top quality work and those could go through short, but frequent internal code review.

sckott commented 7 years ago

could go through short, but frequent internal code review.

good idea - examples of this being done in small teams (not like a small team within google where they can afford to have people that only do code review)?

noamross commented 7 years ago

I agree that it's top-quality work, and short/frequent reviews are good, but I really think that our peer review process is our own thing and a "RO Peer Reviewed" badge or designation should mean specifically that. Perhaps we should think of a badge or some other branding mechanism to convey "Developed by rOpenSci".

cboettig commented 7 years ago

Thanks @noamross. I think you make a great point that we should restrict peer-reviewed badging to things that actually went through a peer-review process (e.g. particularly a review we can link to). I'm not sure if we need to distinguish between those where the review lives in an onboarding issue vs a package issue ("legacy" vs "Peer Reviewed"), it's still basically the same peer review process for, say, rfishbase review as the current ones?

But I agree it doesn't make sense to tag rOpenSci staff-developed packages as peer reviewed. Everyone's made a good point that we probably don't want to / can't review all of them. For these ones though it would be nice to have a different kind of badge to distinguish them, maybe indicate that they get an internal quality check or something. Maybe "Staff Review" to contrast to "Peer Review"?

I don't think having been on CRAN & our GitHub for a long time is a reason in-and-of-itself not to review these -- e.g. rfigshare has been on CRAN for years but probably would have a hard time even onboarding without a bunch of maintenance now. I think there's still reasonable heterogeneity in these packages which would be worth identifying.

karthik commented 7 years ago

All good points! Agreed 💯

maelle commented 6 years ago

Reading this now was I opened a duplicate issue 🤦‍♀️ but hey good occasion to ressuscitate this thread. 👼

I think the registry should have the following tags

sckott commented 6 years ago

agree on 1 and 2. i don't think we mention yet to community contribs about a status badge, do we? for staff reviewed, do you mean like one staff reviews another staff's package?

maelle commented 6 years ago

Status would be sthg like in dev/stable/abandoned. We don't recommend it currently indeed. ☺

I only mentioned staff reviews because it had been discussed in the thread but yes that's what it would be.

karthik commented 6 years ago

I think the registry should have the following tags

onboarded vs. not

Great idea!

staff- vs community-contributed (no other case unless unconf packages are included cf #12 )

This is a very useful distinction to have, especially going forward. It will provide a nice overview of the ratio of work we produce as opposed to what we curate from the community. At some point that could even determine the nature of the organization (from software producer to being more of a hub of activity).

development status if we want to enforce having a status badge for all packages (but do we?) cf #10

Given the fragility of APIs, this is a good idea. No matter how strong our code, a tweak in an API or lack in support from a data provider could quickly tank everything.

not sure if a staff-reviewed badge makes sense, unless such reviews start happening?

I don’t understand what staff reviewed means. rOpenSci folks don’t review each other’s code.

work/talks

maelle commented 6 years ago

I really only included "staff-reviewed" because it came up (as an idea) in the thread above :joy:

So fields to be added are

maelle commented 6 years ago

See also https://github.com/ropensci/codemetar/issues/45 and https://github.com/ropensci/codemetar/issues/23 regarding where the info about onboarding/reviews will live.

maelle commented 6 years ago

https://github.com/ropensci/onboarding-meta/issues/9 "I'm fine with non-peer-reviewed but mature (say, CRAN-submitted) internal RO packages moving from labs to main as long as the badge (and maybe in the future, codemeta.json) distinguishes the two."

maelle commented 6 years ago

codemetar::create_codemeta will now include information about the review if there's an onboarding review badge in the README of the package and if that issue is closed. Cf https://github.com/ropensci/codemetar/issues/23 For now the info included is basic, but it'll be extended to add editor(s) and reviewer(s).

sckott commented 6 years ago

sounds good

maelle commented 5 years ago

Closing this.

maelle commented 5 years ago

Oh and reg development status, in the new packages page it'll be either deduced from a repostatus.org badge, or, if there's no badge, from the organization (ropenscilabs -> concept, other -> active).

karthik commented 5 years ago

At Maëlle's request, there is this now: https://badges.ropensci.org/json/onboarded.json But it is also only for onboarded packages. Not current status. Current status is also label dependent, which is problematic with our workflow (if an editor does not switch the label even if a particular phase of the review has passed, there wouldn't be an automatic change).

maelle commented 5 years ago

so how do you determine pending vs not @karthik ?

karthik commented 5 years ago

same way the badges are done. See this comment: https://github.com/ropensci/roregistry/issues/9#issuecomment-316531715