I will update this post as necessary to reflect the plan and my understanding as it evolves.
Let's specify an algorithm for detecting data access categories (which are currently not part of metajelo itself).
If a license is specified (in a future version of metajelo; https://github.com/labordynamicsinstitute/metajelo/issues/8), then assign some categories from {C_i} based on a hardcoded mapping we have devised from a list of licenses to categories we support. Examples for categories could be:
If an unknown license is specified or specified in text:
Each categoryC_i could have a set of positive phrases {P_i} that indicate a match. We could use something like https://github.com/spencermountain/compromise for normalization, and for creating a custom lexicon and matching utilities.
I will update this post as necessary to reflect the plan and my understanding as it evolves.
Let's specify an algorithm for detecting data access categories (which are currently not part of metajelo itself).
{C_i}
based on a hardcoded mapping we have devised from a list of licenses to categories we support. Examples for categories could be:C_i
could have a set of positive phrases{P_i}
that indicate a match. We could use something like https://github.com/spencermountain/compromise for normalization, and for creating a custom lexicon and matching utilities.