research-software-ecosystem / content

A metadata commons to store research software metadata
Creative Commons Attribution 4.0 International
40 stars 29 forks source link

biotoolsID handling in GitHub ecosystem and bio.tools #165

Open hansioan opened 4 years ago

hansioan commented 4 years ago

@bgruening @joncison @hmenager @matuskalas @piotrgithub1

I am creating this issue to discuss / consult on how we manage bio.tools IDs in both bio.tools and GitHub.

This topic is important because bio.tools is a tool ID provider which means the bio.tools IDs need to be persistent and not change without a serious reason.

Currently how I see bio.tools IDs working: At tool creation:

We need to decide on how we handle the pull request merging, if one of the core team users needs to approve/review the new entry, or perhaps the user who created the entry needs to review the PR or if the tool creation PR gets automatically merged after all validations (I would not go for that).

My opinion is that since we already allow the tool into the bio.tools database immediately then we can reserve the right to approve new tools before they get added to the GiHub side.

The initial addition to the bio.tools database should be done with a "pending approval" flag which should be resolved on the GiHub side into either in approval of the tool or a rejection.

I think the approval or rejection should mainly focus on the fact that the tool is indeed an actual tool and if the id of the tool is acceptable (i.e. not completely different from the name or having some other weird value; I think we will rarely encounter a situation where a tool has an unacceptable id). Everything else about the tool annotation can be fixed later (e.g. wrong toolType, missing license etc)

At tool update

Please give your opinions on the above and also tag others.

joncison commented 4 years ago

The above looks good from a quick read, I'll just highlight / clarify some key points:

  1. Agree biotoolsID should be based on tool name but editable by registrant at registration time (subject to syntax constraint), but only editable post-registration by superuser.
  2. Superuser needs to verify (manually inspect and adjust if necessary) the ID as per the guidelines.
  3. Folk should review the ID guidelines and suggest changes (if really needed) ASAP.
  4. Once manually verified, an idverified or some such flag should be set, at which point the ID can be taken as immutable, and bio.tools URLs based upon it to be persistent.
  5. If for some edge-case reason, a change or new ID really is needed on an existing entry, then this can be requested (and minted by superuser), preserving both IDs in the record (likely the old ID in the otherID field) and ensuring both IDs work, i.e. URLs based on them persistently resolve (to same page)

This is quite a bit of work, but long experience shows that it's necessary, esp. bearing in mind a lot of the value of bio.tools comes from its IDs, and it's nice to have those easily human-readable and concise (hence usable).

Hope this helps!

joncison commented 4 years ago

Oh, and the ID status (and the implications) clearly explained by a label and corresponding pop-up information window in the UI (this used to be there).

hansioan commented 4 years ago

Regarding the biotoolsID change, it should happen not as an update, but as a delete + new registration, the update will be too big of a hassle because:

Given the above I think it's good enough to do a delete + new registration , the only thing we will lose is the additionDate, but I don't see that as an issue, if it is we can handle it.

hmenager commented 4 years ago

Thanks @hansioan for stating things clearly. I agree to all of what is there. A few points here: