transpose-publishing / policies-database

Database of journal policies: TRANsparency in Scholarly Publishing for Open Scholarship Evolution
Creative Commons Zero v1.0 Universal
20 stars 12 forks source link

Choosing a YAML schema #3

Closed dhimmel closed 6 years ago

dhimmel commented 6 years ago

We'll want encode the valid YAML structure from #1 into a schema so we can programmatically enforce it. I'm going to use this issue to jot down potential options.

We can also look to see what http://reusabledata.org does (source).

dhimmel commented 6 years ago

Reusable Data uses the kwalify ruby library. It looks like their is a python port of kwalify named pykwalify. pykwalify appears to be actively maintained (recent commits and issues responded to), while Rx and Yamale from above don't appear as active. Hence, I suggest we go with kwalify/pykwalify. Here's the reusabledata schema.

jpolka commented 6 years ago

Sounds good. Just so I'm understanding this correctly, the schema would not need to be seen by users - but performs response validation for records like this one?

dhimmel commented 6 years ago

Just so I'm understanding this correctly, the schema would not need to be seen by users - but performs response validation

Exactly. Perhaps users may want to look at it to see what the valid values are for a certain field. So the next step for us will be to convert https://github.com/transpose-publishing/policies-database/issues/1#issuecomment-387247838 into a schema.yaml file. @jpolka do you want to try doing that and we can touch base if you get stuck. Here's a usage guide which should contain the options for making a schema.

One field for example would be journals:

type: map
mapping:
  # journals affected by this policy
  journals:
    type: seq
    sequence:
      - type: str

It looks like the following types of scalar values are allowed:

Regarding unset fields, I'm thinking we should comment them out rather than setting to null?

jpolka commented 6 years ago

I'll give it a shot @dhimmel :) What's the benefit of commenting out unset fields? I imagine this might be more confusing to users.

ALSO - can we just add comments to the YAML files so users don't need to refer back to the schema to understand what responses are valid?

jpolka commented 6 years ago

Ok - how does this look @dhimmel? Note that the comments here could (and probably should, for the sake of usability?) also be present in the YAML files themselves.

One other thing. If we do NOT comment out null values, what do we add to enum to support null values? Should it be "null" or []?

Also - I think it would be great to add some instructions (or a link to CONTRIBUTING.MD) to the top of every YAML file to help novices?

type: map
mapping:
  #-------------Info between these lines is from SHERPA/RoMEO (users should not not edit) --------------------
  # Policy ID from SHERPA/RoMEO
  romeopub:
    type: str
  # A list of the journals from SHERPA/RoMEO associated with this romeopub- "jtitle"
  journals:
    type: seq
    sequence:
      - type: str
  # The Policy ID from SHERPA/RoMEO that is the parent of this policy - "parentid"
  parent-policy:
    type: str
  # A list of the Policy IDs from SHERPA/RoMEO that list this Policy ID as its parent
  child-policies:
    type: seq
    sequence:
      - type: str
  # -------------------------------------------------------------------------------------------
  # If this policy does not apply to the journals listed above, leave a note here.
  flag-romeopub:
    type: str
  #
  # OPEN PEER REVIEW
  #
  # Peer review policy url (valid url)
  peer-review-url:
    type: str
    pattern: /^(TODO|(ht|f)tp(s?)\:\/\/\w[\/\.\-\:\=\?\&\_\+\u0023\w]+)$/ # From the resuabledata schema
  # Does the journal publish the content of peer reviews? (mandatory/optional/no)
  open-reports:
    type: str
    enum: [mandatory, optional, no]
  # Are reviewer identities revealed to the author? (mandatory/optional/no)
  identities-revealed:
    type: str
    enum: [mandatory, optional, no]
  # Are reviewer identities published? (mandatory/optional/no)
  identities-published:
    type: str
    enum: [mandatory, optional, no]
  #
  # CO-REVIEWERS
  #
  # Co-reviewer policy url (valid url)
  co-review-url:
    type: str
    pattern: /^(TODO|(ht|f)tp(s?)\:\/\/\w[\/\.\-\:\=\?\&\_\+\u0023\w]+)$/ # From the resuabledata schema
  # Journal’s policy about co-reviewers - ie people who collaborate with an invited reviewer (free text)
  co-review-policy:
    type: str
  # Does the journal make it clear in the reviewer invitation email that co-reviewers can contribute? (yes/no)
  co-review-invited:
    type: str
    enum: [yes, no]
  # Is there a dedicated place in the submission form to identify co-reviewers? (yes/no)
  co-review-field:
    type: str
    enum: [yes, no]
  #
  # PEER REVIEW TRANSFER
  #
  # Co-reviewer policy url (valid url)
  co-review-url:
    type: str
    pattern: /^(TODO|(ht|f)tp(s?)\:\/\/\w[\/\.\-\:\=\?\&\_\+\u0023\w]+)$/ # From the resuabledata schema
  # Policy on transfer of peer reviews (free text)
  transfer-policy:
    type: str
  # What are the titles of the sections of the review form? (free text)
  review-structure:
    type: str
  # Are there separate fields for technical & impact evaluation? (yes/no)
  separate-structure:
    type: str
    enum: [yes, no]
  #
  # PEER REVIEW CREDIT
  #
  # Peer review credit policy url (valid url)
  credit-url:
    type: str
    pattern: /^(TODO|(ht|f)tp(s?)\:\/\/\w[\/\.\-\:\=\?\&\_\+\u0023\w]+)$/ # From the resuabledata schema
  # Does the journal deposit peer review information into ORCiD? "via service" includes Publons (directly/via service/no)
  orcid-peer-review:
    type: str
    enum: [yes, via service, no]
  #
  # PREPRINTS
  #
  #-------------Info between these lines is from SHERPA/RoMEO (users should not not edit) --------------------
  # Can users archive preprints? From SHERPA/RoMEO (can/cannot/restricted/unclear)
  prearchiving:
    type: str
    enum: [can, cannot, restricted, unclear]
  #  Preprint restrictions (from SHERPA/RoMEO: prerestrictions)
  prerestrictions:
    type: seq
    sequence:
      - type: str
  # Copyright policy url (from SHERPA/RoMEO: copyrightlinkurl)
  copyrightlinkurl:
    type: str
  # Conditions (from SHERPA/RoMEO: conditions)
  conditions:
    type: seq
    sequence:
      - type: str
  #-----------------------------------------------------------------------------------------------
  # Preprint policy url (valid url)
  preprint-url:
    type: str
    pattern: /^(TODO|(ht|f)tp(s?)\:\/\/\w[\/\.\-\:\=\?\&\_\+\u0023\w]+)$/ # From the resuabledata schema
  # Version of the preprint that can be posted to a server (before review only/any/other/none)
  preprint-version:
    type: str
    enum: [before review only, any, other, none]
  # Time when a preprint can be posted (before acceptance only, anytime, other)
  preprint-time:
    type: str
    enum: [before acceptance only, anytime, other]
  # Can preprints be cited in the reference list? (yes/no)
  preprint-citation:
    type: str
    enum: [yes, no]
  # Acceptable servers or characteristics of servers - eg specific names, non-commercial, “recognized," etc (free text)
  acceptable-servers:
    type: str
  # What type of coverage or discussion of preprints is allowed, eg in the media or in scientific blogs? (free text)
  preprint-media:
    type: seq
  # url for preprint-media
  preprint-media-url:
    type: str
    pattern: /^(TODO|(ht|f)tp(s?)\:\/\/\w[\/\.\-\:\=\?\&\_\+\u0023\w]+)$/ # From the resuabledata schema
  # Policies on preprint licensing (free text)
  preprint-licensing:
    type: str
  # url for preprint-licensing
  preprint-licensing-url:
    type: str
    pattern: /^(TODO|(ht|f)tp(s?)\:\/\/\w[\/\.\-\:\=\?\&\_\+\u0023\w]+)$/ # From the resuabledata schema
  # Policy on whether submitted preprints will not be rejected if a competing work comes out in another journal after the date of preprinting (free text)
  scoop-protection:
    type: str
  # url for scoop-protection
  scoop-protection-url:
    type: str
    pattern: /^(TODO|(ht|f)tp(s?)\:\/\/\w[\/\.\-\:\=\?\&\_\+\u0023\w]+)$/ # From the resuabledata schema
  # Policy on incorporating community reviews or comments on preprints into editorial assessment (free text)
  community-reviews:
    type: str
  # url for community-reviews
  community-reviews-url:
    type: str
    pattern: /^(TODO|(ht|f)tp(s?)\:\/\/\w[\/\.\-\:\=\?\&\_\+\u0023\w]+)$/ # From the resuabledata schema
dhimmel commented 6 years ago

What's the benefit of commenting out unset fields? I imagine this might be more confusing to users.

Okay, it would be nice to start with all values as null, like community-reviews-url: null. We just have to make sure this will work with pykwalify.

Also - I think it would be great to add some instructions (or a link to CONTRIBUTING.MD) to the top of every YAML file to help novices?

Definitely, let's skip documentation for now and then make a doc push Wednesday night when we've got the implementation down.

We will be busy tomorrow! It will be best to have you add this file via a pull request. We can go over that tomorrow morning.

dhimmel commented 6 years ago

It will be best to have you add this file via a pull request. We can go over that tomorrow morning.

@jpolka I will commit this file with your authorship info in a new PR and we can go from there.

Also we should move comments in schema.yml to desc: fields.

jpolka commented 6 years ago

Just realized the schema should include a human-readable publisher name. @dhimmel should I create a new branch & pull request on dhimmel/policies-database/schema.yml or...? (thanks in advance)

  # Policy/publisher name from SHERAP/RoMEO - "name"
  policy-name:
    type: str
dhimmel commented 6 years ago

Wait till I'm done with https://github.com/transpose-publishing/policies-database/pull/8 and that's merged.