stanford-crfm / ecosystem-graphs

257 stars 35 forks source link

Support value and explanation fields in the .yaml files #25

Closed dilarasoylu closed 2 years ago

dilarasoylu commented 2 years ago

Purpose

The purpose of this PR is to allow value and explanation fields to be used in .yaml files.

Method

The method is described in the code comments, also pasted below.

      /*
      * We expect each assetField to have a value and an explanation field.
      * When reading the field from the schemaFieldValue, we populate each of
      * these fields as follows: 
      * (1) If the schemaFieldValue is an object with exactly two fields, 
      *     'value' and 'explanation', we directly read these into the assetField.
      * (2) If the schemaFieldValue is an object with only the 'value' field, we
      *     read it to the corresponding field of the assetField, and let the
      *     explanation field be null.
      * (3) If the schemaFieldValue doesn't conform to the rules in (1) and (2), 
      *     we directly assign it to the value field of the assetField, and let the
      *     explanation field be null.
      */

Note that this is a deviation from the request in the linked issue, which read: Each field has a value and explanation field. If the field is string, it is interpreted as the value. The motivation for the deviation is to allow for more flexibility when filling out the .yaml files. For example, the sample field is of type list, meaning that we would look for a value and a justification field when reading the sample value from the .yaml, but there isn't much reason to provide an explanation for this field.

Options

The proposed solution allows some flexibility while ensuring that a certain template is followed. Below are additional proposals, which can be additions or edits to the current solution.

1. Requiring each field to be an object with a field named value

If we want to reduce the flexibility, my proposal is to standardize it across the board for all fields. That is, we expect each field to be an object with a field named value. We also accept a field named explanation, but the object shouldn't have any other fields.

2. Specifying the types for all the fields in the schema

Currently, our schema type definitions are loose. For example, if someone wanted to provide a list for the intended_uses field, instead of a string, they can. I think it would be better both from a programmatic and user experience perspective to explicitly specify the types in the schema. In the future, if we decide to constrict the value space for a given set of properties (e.g. access), we could quickly swap enums for the types.