sdsc-ordes / modos-api

Python API to manage multi-omics digital objects
https://sdsc-ordes.github.io/modos-api
Apache License 2.0
0 stars 0 forks source link

feat: support terminology codes #106

Closed cmdoret closed 1 month ago

cmdoret commented 1 month ago

Summary

This adds support for terminology codes instead of free text for specific metadata fields, along with autocomplete suggestion in the terminal.

Currently, the following properties/terminologies are used:

Major changes

Trying it out

To test local autocomplete in terminal:

modos create data/example
modos add data/example sample

To rely on the server for autocomplete:

make deploy
modos --endpoint=http://localhost create data/example
modos --endpoint=http://localhost add data/example sample

Notes

Codes are recommended based similarity between user input and labels, but only the URIs are persisted in metadata.

Follow up (separate issues):

Open questions

When creating a modos from input yaml (instead of interactively) (see data/ex_config.yaml), URIs are now required for the 3 properties above.

It may be painful for users to find out what URIs to input in the yaml. Should we provide some kind of subcommand just to get the codes (basically a fuzon wrapper)? Perhaps something along the lines of this

# modos codes <property> <query>
modos codes cell_type "red blood cell"
supermaxiste commented 1 month ago

Code review

Mostly suggestions, recommendations and clarifications. To add to my other comments: Makefile L42 has a typo in the S3_PUBLIC_URL:

https://github.com/sdsc-ordes/modos-api/blob/8d9deb2dbe7dbc5cc9a191b18fc6e30dc94779bd/Makefile#L42

supermaxiste commented 1 month ago

Deployment review

praise: with make deploy everything was smooth praise: the autocomplete feature is not only fast, but works extremely smoothly 🤓

issue(minor, non-blocking): the recommendations can sometimes point to blank nodes. When blank nodes are selected, modos throws an error because we're not providing a proper URI. See screenshot as an example.

Screenshot 2024-10-21 at 16 28 55 [...] Screenshot 2024-10-21 at 16 36 29

question(minor, non-blocking): when working with modos objects, there's no list command and it becomes a bit confusing to work with objects that you created but forgot about. Maybe I missed a command, but as far as I can see everything assumes that we know the objects we're working with.

$ cli:~/modo-api$ modos --help
Usage: modos [OPTIONS] COMMAND [ARGS]...

  Multi-Omics Digital Objects command line interface.

Options:
  --endpoint TEXT  URL of modos server.  [env var: MODOS_ENDPOINT]
  --version        Print version of modos client
  --help           Show this message and exit.

Commands:
  add      Add elements to a modo.
  create   Create a modo interactively or from a file.
  publish  Export a modo as linked data.
  remove   Removes an element and its files from the modo.
  show     Show the contents of a modo.
  stream   Stream genomic file from a remote modo into stdout.
  update   Update a modo based on a yaml file.

I will pre-approve the PR since all the points I raised might be a separate PR.

cmdoret commented 1 month ago

blank nodes: good catch! Addressed upstream, as it makes no sense for fuzon to even keep these in memory (they have an undetermined URI) https://github.com/sdsc-ordes/fuzon/pull/31

cmdoret commented 1 month ago

list objects: added a modos list command to list remote objects on the endpoint remembering local objects on the filesystem is probably not worth it as it will come with many edge cases and we mostly intent to use modos with remote objects anyways.

cmdoret commented 1 month ago

I've also added a modos search-codes command to provide an easy way to find code URIs