ucd-library / ezid

Bash client Script for ezid identifier management
MIT License
0 stars 2 forks source link

=head1 NAME

ezid - Get, Add, Update, and delete ARKS via the EZID api

=head1 SYNOPSIS

ezid [-S|--session=] [-b|--base=] \ [-n|--dry-run] [-p|--print] [-h|--help]\ []

where is one of fq, anvl, args_to_anvl, array_to_anvl, login, get, mint, put, update, delete

Ezid is a script that simplifies the retrieval, creation and update of ezid ARKS. Ezid uses as set of functions to communicate with the ezid services. Ezid uses a few other tools to perform the functions. L</"COMMANDS"> is summary of the commands that are available.

A good review of the format and organization of an B can be found in the L<ID Concepts|https://ezid.cdlib.org/learn/id_concepts> description.

=head1 GLOBAL OPTIONS

=over 4

=item B<-S|--session|--http_session=I>

Set the HTTPIE session variable. This allows you to maintain multiple authentication setups for your EZID, so that you can go back and forth between users. This is equivalent to the L<httpie|https://httpie.org/> session variable, and are shared with that command. Sessions primarily keep the basic authentication parameters saved. By default the B session is used.

=item B<-s |--base=I>

Set the base that you want to use. The base is a combination of the scheme,NAAN, and shoulder components of the identifier. The default base is the EZID ark testing shoulder B<ark:/99999/fk4>. This is a safe place to do your testing. If you are using complete arks in your commands, this is not required, but if you are using shorthands, for example only using the blades of the arks, then the fully qualified arks are created from this base.

=item B<-n|--dry-run>

Show what you would do do not really do it. Because some commands require multiple access to the server, this command does not always work properly :(.

=item B<--print|http_print>

Adjust httpie I<--print=> argument. By default only the response body I<--print=b> is shown.

=item B<-h|--help>

Shows the manpage for the program. The help pages are embedded in the script and require the functions, C and C to work properly.

=back

=head1 COMMANDS

There are a number of ezid commands that are used in manipulating the ARKS. There are metadata commands to edit and create the required inputs, and there are commands to retrieve, create, and update records as well.

There are some informational commands that do not access the CDL server.

C<ezid [--base=base] fq ark> will parse an ark, and either verify it, or expand it to a full ark if it is only a shoulder. See L</"FQ"> for more information.

C<ezid anvl [--array] [--csv=items]> Will read anvl formatted data from C and output it either as a bash array, or as a csv row. See L</"ANVL"> for more information.

C<ezid args_to_anvl [key:value] [key2:anvl2]> will read passed key:value pairs from the commandline and output anvl format to C. See L</"ARGS_TO_ANVL"> for more information.

C<ezid array_to_anvl declare -p bash_array> will read the first parameter as serialized bash array, and parse that as anvl. See L</"ARRAY_TO_ANVL"> for more information.

Next there are a set of commands that communicate with the CDL service. Note, that ezid uses L<httpie|https://httpie.org/> for its http communcation. This allows users to combine ezid with other httpie requests if required. Login infomation is stored using the standard C methodology, see L</"GLOBAL OPTIONS"> for httpid options.

C<ezid [options] get [--array] [--csv=list] ark(s)> retrieves ARKs from the ezid server. Can output in multiple formats. See L</"GET"> for more information.

C<ezid [--session=http_session] login --auth=user[:password] > Allows users to set and save their login parameters for further updates. See L</"LOGIN"> for more information.

C<ezid [global_options] mint [--proxy=proxy_server] [key:value] [key2:value2] ...> will mint a new ark in the specified C<--base>. See L</"MINT"> for more information.

C<ezid [global_options] update ark [key:value] [key2:value2] ...> updates an existing ark with the passed key:value pairs. See L</"UPDATE"> for more information.

C<ezid [global_options delete ark(s)>: Deletes ARKS if current status is C<_status:reserved>. See L</"DELETE"> for more information.

=head2 FQ

ezid <--base=> fq [ark]

C will output the fully qualified C. This will combine the C specifiation with the passed ark or ark:fragment, and guess at the fully qualified version. Note that the C<--base> is an c parameter, and not an option of this function. Currently, this function does NOT use the parity check that is part of the ark: specification.

verify an ark:

ezid --session=ucd-legacy login --auth=ucd-legacy

After which the session C

=head3 ANVL OPTIONS

=over 4

=item B<--array>

This will output a bash style associative array from a given ARK, where each key of the array is a key of the retrieved ANVL format. these can be Ced for use later in a bash script.

=item B<--csv=I>

You can specify the columns that you would like to retrieve using this parameter. Somewhat following anvl conventions, the column names are B<:> delimited. This is a convenient way to create a table from a list of ARKs.

=item B<--ark>

You can specify and ark to be associated with the input avnl data. This is a convenience primarily with the C<--csv> parameter, so that you can easily include an ark in CSV file output. Note that if the ANVL input also specifies an C parameter, then that takes precedence.

=back

=head2 ARGS_TO_ANVL

ezid args_to_anvl [anvl parameters]

C reads passed parameters as set of items to include in an ANVL file, and output that ANVL file to stdout. In keeping with ANVL format, arguments are delinated into key:value pairs using the colon C<:>. Like normal ANVL, if you want a colon as part of your key, then that needs to be escaped. The actual values themselves will be verified and also escaped if they need to be.

This is a simple example for two parameters

ezid args_to_anvl erc.who:Quinn erc.what:Eskimo

This example shows that value escaping is a thing.

./ezid args_to_anvl erc.who:Quinn erc.what:Eskimo where:$'The\nfrozen\north'

=head2 ARRAY_TO_ANVL

ezid array_to_anvl "$(declare -p foo)"

C reads a bash associated array as the mand outputs ANVL formatted data from C and reformats it as an ANVL file. This is typically a debug function, but could be used in script environment to mint values for example, where the output of this file is piped to the stdin for a C command.

This example converts a simple ANVL file to a bash array

declare -A foo; foo[erc.who]=Quinn foo[erc.what]=Eskimo ezid array_to_anvl "$(declare -p foo)"

=head2 GET

ezid I B [--array] [--csv=I] [--header] ARK(s)

B retrieves existing ARKS from the ezid server, and displays them either as anvl (default), csv or as a bash array for evaluation. The CSV format is most suitable for retrieving multiple arks.

=head3 GET OPTIONS

=over 4

=item B<--csv=I>

You can specify the columns that you would like to retrieve using this parameter. Somewhat following anvl conventions, the column names are B<:> delimited. This is a convenient way to create a table from a list of ARKs.

=item B<--header>

When specifing a I<--csv> command, this will include a header on the first row of the output.

=item B<--array>

This will output a bash style associative array from a given ARK, where each key of the array is a key of the retrieved ANVL format. these can be Ced for use later in a bash script. For example the command C<eval \$(./ezid get --array ark:/87287/d7q30n); echo \${anvl[_target]}>

=back

=head2 LOGIN

ezid login --auth=USER[:PASS]

B is a simple wrapper around the B<httpie --auth=USER:PASS> command. This allows users to setup their basic authorization, which is then stored in the standard httpie session paratmeters. It is possible to maintain multiple users, via the ezid I<--session> parameter to maintain different users, eg.

ezid --session=ucd-legacy --auth=ucd-legacy

After which the session C, will be set as a new httpie session, with the saved authorization.

=head3 LOGIN OPTIONS

=over 4

=item B<-a|--auth=USER[:PASS]>

You specify the basic authentication for the ezid.cdlib.org server. If you only specify the USER, then you are prompted for the password.

=back

=head3 MINT OPTIONS

=over 4

=item B<--proxy=proxy_server>

Setting this parameter will cause the mint function, to perform an immediate L</"UPDATE"> where the target key value pair will be specified as C<_target:{proxy_server}{ark}>, the concatentation of the

ezid mint --proxy=https://ark.foo.edu/ -verify erc.who:Quinn erc.what:Eskimo ./ezid mint --proxy=https://ark.foo.edu/ --verify erc.who:Quinn erc.what:Eskimo

responds with:

success: ark:/99999/fk4qf9zb42
_updated: 1555441448
_target: https://ark.foo.edu/ark:/99999/fk4qf9zb42
erc.who: Quinn
_profile: erc
_export: yes
_owner: ucd-legacy
_ownergroup: ucd-library
_created: 1555441448
_status: reserved
erc.what: Eskimo

=item B<--verify>

After creating the ark, will L</"GET"> the ark to verify it was created. This will also output the record ANVL format, unless another option C<--array> or C<--csv> is specified.

=item B<--csv=I> | =item B<--array>

See L for output options.

=back

=head2 UPDATE

ezid [global_options] update --ark=ark [--verify] [--proxy=proxy_server] [key:value] [key:value] ...

C updates and existing ark by overwriting any of the passed C pairs. An existing C<--ark> is required, otherwise the function works exactly as the L</"MINT"> command. Please see L</"MINT"> on usage.

=head3 UPDATE OPTIONS

=over 4

=item B<--ark=ark>

Specify the ark to update.

=item B<--csv=I> | B<--array> | B<--verify>

See L for output options.

=back

=head2 DELETE

ezid I delete [ARK(s)]

B deletes existing ARKS from the ezid server, if their current status is C.

=head1 SCRIPTING

=head2 CSV SCRIPTING

Scripting with CSV files can be hard, if you are trying to parse complicated csv files. So a good method of scripting with csv files is to use another tool. For example, the L<csvtool|https://colin.maudry.com/csvtool-manual-page/> is a nice tool for scripting on csv files. C allows you to call a script on each row of a csv file, passing the values as positional parameters. You can make a small wrapper script to translate these positional parameters to key:value pairs, and then run the tool over a csv file to create or update files.

For example, assume you have a csv file, C like this:

who,what,more
Quinn,The Eskimo,Bob Dylan
Sloopy,Hang on,The McCoys

You can create a small script file, C that looks like this:

 # /bin/bash
 base=ark:/99999/fk4
 proxy=https://digital.ucdavis.edu/
 cols=success:_target:erc.who:erc.what
 ezid --base=${base} mint --proxy=${proxy} --csv=${cols} erc.who:"$1" erc.what:"$2"

Then you can mint arks, and save the minted arks back to a csv file like this. The example shows how you might use ~csvtool~ to select only a set of columns. In practive, the little wrapper script can be a good place to add constants, or manipulate the columns before sending them off to ezid.

csvtool namedcol who,what in.csv |\
head | tail -n +2 | csvtool call ./mint.sh - | tee out.csv

With the following results:

ark:/99999/fk4qc17z06,https://digital.ucdavis.edu/ark:/99999/fk4qc17z06,Quinn,The Eskimo
ark:/99999/fk4km0j474,https://digital.ucdavis.edu/ark:/99999/fk4km0j474,Sloopy,Hang on

=head1 AUTHOR

Quinn Hart qjhart@ucdavis.edu