Allow creation of new taxonomy terms on the fly

mjordan commented 5 years ago

Currently, Workbench requires a term ID for taxonomy fields. At Islandora Con, there was some discussion around allowing users to populate a taxonomy field with terms that don't yet exist. To do this, users would need to provide term strings.

seth-shaw-unlv commented 5 years ago

So, is this another configuration where we say, for a given column, which target taxonomy field we need to use for the lookup? The REST will want that tid first.

mjordan commented 5 years ago

Yes, that makes sense.

mjordan commented 5 years ago

Looks like the call to /jsonapi/field_config/field_config provides the vocabulary ID (e.g., islandora_models that is linked from the field. It's available in the field's ['attributes']['dependencies']['config'] value 'taxonomy.vocabulary.xxxx'. Since we know the field's machine name, we can get the taxonomy ID from that, we can then test whether the term string exists.

mjordan commented 5 years ago

Did some hacking on the plane and I've got code that adds a list of vocabularies referenced from a given field, and also code that picks out the term names from a given vocab. Latter requires a custom View that I've exported and can create via a configuration.

One glitch I've discovered: Taxonomy fields can reference multiple vocabularies. For example, Linked Agent references the person, family, and corporate body vocabularies. If we allow users to add terms to a multi-vocabulary field, we need a way to indicate in which vocabulary to add new terms to. Maybe require the vocabulary name as a prefix in the input, like "person:Mark Jordan"? (@rangel35 would appreciate your input since you suggested this feature.) For fields that reference only one vocabulary, we wouldn't need the prefix.

mjordan commented 5 years ago

The module containing the View listing all terms in a vocabulary is at https://github.com/mjordan/islandora_workbench_integration.

mjordan commented 5 years ago

For the record, the following creates a new term:

curl -vi -uadmin:islandora -H"Content-type: application/json" --data '{"vid":[{"target_id":"islandora_media_use","target_type":"taxonomy_vocabulary"}],"status":[{"value":true}],"name":[{"value":"Marks term 10"}],"description":[{"value":"","format":null}],"weight":[{"value":0}],"parent":[{"target_id":null}],"default_langcode":[{"value":true}],"path":[{"alias":null,"pid":null,"langcode":"en"}],"field_external_uri":[{"uri":"http:\/\/pcdm.org\/use#Mark","title":null,"options":[]}]}' -X POST "http://localhost:8000/taxonomy/term?_format=json"

So, all the pieces are in place to make this feature happen. All we need is some decisions around how it should behave!

mjordan commented 5 years ago

Another issue: Newly created terms won't have an external URI, unless we figure out some way of allowing the user to specify one. Could get complicated.

mjordan commented 5 years ago

I think we'll want a new config option, 'allow_new_terms' (defaults to False), so we can add to --check a test to see if the View that exposes all terms in a vocabulary is enabled.

rangel35 commented 5 years ago

(@rangel35 would appreciate your input since you suggested this feature.) For fields that reference only one vocabulary, we wouldn't need the prefix.

I can't be much help...all of mine only reference one vocabulary...even linked agent....

rangel35 commented 5 years ago

looks like you solved it with taxonomy type; I was going to suggest using the VID

mjordan commented 5 years ago

My example is using a VID (e.g. Linked Agent field but in my example, I want new terms to be added to the 'person' vocabulary). Sorry if that wasn't clear. So great minds think alike!

rangel35 commented 5 years ago

should they always be added to the "person" vocabulary, because then you can probably just set that as default...but if you need it to vary then you need to accept the value from the user, then you can get the TID from the name and add it if it doesn't find it.....

rangel35 commented 5 years ago

here are some bits of code of what I did...explanation of variables might help :) $headers -- csv headers (usually the headers are the field_names i.e. field_identifier, field_genre...)

...for linked agent they are usually the rel term with the value of subsequent rows as the name ($import_row) so if the import_row has "rel:" in the header it is the roleterm for the linked agent and I join it with the import_row and change the header to "field_linked_agent" so I can pass it all to the appropriate functions...probably a better way to do this but it works...open to ideas if anybody has any...

let me know if you need more clarification

            //finds linked agent with roleterm
            if (strpos($headers[$i], 'rel:') !== false) {
                $import_row[$i] = $import_row[$i] . "|" . $headers[$i] ;
                $headers[$i] = "field_linked_agent";
            }
            if ($headers[$i] == "field_linked_agent") {                           
                $typedrelation = $this->gettypedrelation($connection,$import_row[$i],$batchLang);
                echo "the linked agent is \n";
                var_dump($typedrelation);
                $import_row[$i] = $typedrelation;
            }

protected function gettypedrelation($connection,$agent,$batchLang) {
    $linkedagent = array() ;

    if (!empty($agent)) {
        list($name, $role) = (explode("|",$agent)); 
        $tid = $this->getTidByName($connection, $name, 'ladi_contributors'); 

        if ($tid==0) {
            $term = Term::create(array(
                'parent' => array(),
                'name' => $name,
                'vid' => 'ladi_contributors',
                'langcode' => $batchLang,
            ));
            $term->save();
        }
    }
    $tid = $this->getTidByName($connection, $name, 'ladi_contributors'); 
    $linkedagent['target_id'] = $tid;
    $linkedagent['rel_type'] = $role ;

    return $linkedagent ;
}

and if you have the user pass the vocabulary you just change that to a variable instead of hardcoding it like I did

rangel35 commented 5 years ago

you might also want this function...this one I got from stackexchange

/**

Utility: find term by name and vid.
@param null $name
Term name
@param null $vid
Term vid
@return int
Term id or 0 if none.
notes: https://drupal.stackexchange.com/questions/225209/load-term-by-name */ protected function getTidByName($connection, $name = NULL, $vid = NULL) { $properties = []; if (!empty($name)) { $properties['name'] = $name; } if (!empty($vid)) { $properties['vid'] = $vid; } echo "the properties are\n"; var_dump($properties);
$terms = \Drupal::entityManager()->getStorage('taxonomy_term')->loadByProperties($properties); $term = reset($terms);

return !empty($term) ? $term->id() : 0; }

mjordan commented 5 years ago

@rangel35 I've got most of the code in place already (see bottom of https://github.com/mjordan/islandora_workbench/blob/issue-61/workbench_utils.py) but don't have everything wired together yet. We could set a target vocabulary as a default in the config, but it would be useful to be able to set the target VID per value in the CSV as well. Useful, but a bit awkward. Maybe a default in config and the ability to specify using funky syntax in the CSV values?

rangel35 commented 5 years ago

can maybe make the CSV header be a combination of the field|vocabulary for linked agent and you can do an if to check if the field is linked agent and split on | then if it doesn't have a vocabulary then you use the default...a little hacky but then so is what I did :)

mjordan commented 5 years ago

Yes, or we could have a mapping in the config file that specifies field:vocabulary pairs. That way, or the way you describe with the column header, implies all new terms should be added to the same vocabulary. That may be enough. But what I was describing would let a user have three different new values in one field, and add each new value to a separate vocabulary. But that might be overkill.....

rangel35 commented 5 years ago

hmm...I like that method...it might be overkill but might also save on columns, my way uses up three columns if they need to change while yours only uses one column header is field_linked_agent row value is rel:ctb|Mark Jordan|person and when you split on the | you get the 3 pieces

mjordan commented 5 years ago

I want to step back for a second to get my thoughts in order, since Workbench currently supports two kinds of taxonomy fields: simple taxonomy fields and typed relation fields.

We've already defined a pattern for typed relation fields in the CSV, where we separate the structural parts of the incoming taxonomy value with a colon, e.g. relators:ctb:30 (where 30 is an existing term ID). We can have multiple values in one field: for example relators:ctb:30;relators:art:35 for typed relation and 100;134 for simple taxonomy fields. Workbench only uses one CSV column per Drupal field, unlike your module, which uses a separate column per value for a field.

If we want to allow new taxonomy terms for typed relation fields, we should follow the structure that already exists (relators:ctb:Mark Jordan) but we also need to include the target vocabulary ID. For example, relators:ctb:Mark Jordan:person. As far as implementation goes, we can check to see if the third part is numeric; if it is, we can assume it's a term ID (term exists) and if it's not, we can assume it's a term name (term is new) and add the term name to the target vocabulary if it doesn't already exist. This way, we can also omit the vocabulary ID for Drupal fields that only reference one taxonomy (e.g. relators:ctb:Mark Jordan).

For non-typed relation taxonomy fields, we can use the same pattern, for example Mark Jordan:person. This string can be split on : and we can assume that the last part from the split is the vocabulary ID. We can also allow multiple values like Mark Jordan:person; Minnie Rangel:person.

I'm inclined to not allow users to include a : in the new term name, since a colon will complicate the parsing of the structured strings in the CSV fields. There has to be a limit to what edge cases we can support. If they have a taxonomy term that contains a colon, they can add it to their taxonomy and use the term ID. Even if we disallow colons in new term names, this is all worryingly complex, but we can include robust validation in the --check operation that will warn the user if Workbench can't understand the CSV values.

seth-shaw-unlv commented 5 years ago

I think this seems reasonable. As for allowing colons in the new values; we could also use a string literal delimiter such as single-quotes. E.g. relators:ctb:'Has a colon: for some reason':person. It just requires using the CSV parser instead of a split.

rangel35 commented 5 years ago

that was why I was using the | (pipe) as my delimiter since you rarely have that in a value

just read Mark's post, I keep forgetting you already have a lot of work done in the project so your delimiters are set :) so ignore this :)

mjordan commented 5 years ago

Would love a sanity check: newly added fields are not showing up in the output of `curl -v -o config.yml -H'Accept: Application/vnd.api+json' -uadmin:islandora "http://localhost:8000/jsonapi/field_config/field_config". Would one of you mind performing a sanity check?

enable the JSON:API at admin/config/services/jsonapi
add a new field to you Repository Item content type
issue the following command: curl -v -o config.yml -H'Accept: Application/vnd.api+json' -uadmin:islandora "http://localhost:8000/jsonapi/field_config/field_config"
open config.yml and search for the new field's machine name

seth-shaw-unlv commented 5 years ago

@mjordan, it isn't working for me either. I also tried after clearing cache, just to be safe, and it still isn't there.

seth-shaw-unlv commented 5 years ago

@mjordan, um... neglected to get the next page of results (http://localhost:8000/jsonapi/field_config/field_config?page%5Boffset%5D=50&page%5Blimit%5D=50)... and there it is.

mjordan commented 5 years ago

Ha! Thanks! I didn't even think about looking for a pager. Whew!

mjordan commented 5 years ago

Hours later, I finally figure out that recursive functions in Python need to return within the recursive branch.... now on with the real task of being able to add new terms to vocabularies :smile:

mjordan commented 4 years ago

Put in a full day on this but it's there as of 398a01e34a0d2fb779829f8a13ca2e40e051d995 in master! Now, Workbench lets you:

use term names instead of term IDs (but you can still use term IDs if you want)
with allow_adding_terms: true in your configuration, any term names that do not match terms in a field's linked vocabulary are used to create new terms.

I haven't applied this work to typed relation taxonomy fields yet, or to the edge case where a field has more than one taxonomy linked to it (which is possible, apparently).

You do need to install the Islandora Workbench Integration module and enable the Taxonomy term REST endpoint to use this feature but that is described in the README.

mjordan commented 4 years ago

Related: #98.

mjordan commented 4 years ago

I'm going to close this since this functionality is in place for simple terms (e.g., with term name only). There are separate issues #110 and #111.

mjordan / islandora_workbench

Allow creation of new taxonomy terms on the fly #61