OCIO STI Concept Tagging Service

An API for exposing models created with STI concept training. This project was written about here for the Federal Data Strategy Incubator Project. A running version of this API may be found here, however, this is a temporary instance for demos purposes. It may not be available long-term. Please do not use it in production or at scale.

What is Concept Tagging

By concept tagging, we mean you can supply text, for example: Volcanic activity, or volcanism, has played a significant role in the geologic evolution of Mars.[2] Scientists have known since the Mariner 9 mission in 1972 that volcanic features cover large portions of the Martian surface. and get back predicted keywords, like volcanology, mars surface, and structural properties, as well as topics, like space sciences, geosciences, from a standardized list of several thousand NASA concepts with a probability score for each prediction.

Using Endpoint

Request

The endpoint accepts a few fields, shown in this example:

{
    "text": [
        "Astronauts go on space walks.",
        "Basalt rocks and minerals are on earth."
    ], 
    "probability_threshold":"0.5",
    "topic_threshold":"0.9", 
    "request_id":"example_id10"
}

text (string or list of strings) -- The text(s) to be tagged.
probability_threshold (float in [0, 1]) -- a threshold under which a concept tag will not be returned by the API. For example, if the threshold is set to 0.8 and a concept only scores 0.5, the concept will be omitted from the response. Setting to 1 will yield no results. Setting to 0 will yield all of the classifiers and their scores, no matter how low.
topic_threshold (float in [0, 1]) -- A probability threshold for categories. If a category falls under this threshold, its respective suite of models will not be utilized for prediction. If you set this value to 1, only the generalized concept models will be used for tagging, yielding significant speed gains.
request_id (string) -- an optional ID for your request.

You might send this request using curl. In the command below:

Substitute example_payload_multiple.json with the path to your json request.
Substitute http://0.0.0.0:5000/ with the address of the API instance.
```
curl -X POST -H "Content-Type: application/json" -d @example_payload_multiple.json http://0.0.0.0:5000/findterms/
```
Response

You will then receive a response like that here. In the payload, you will see multiple fields, including:
- features -- words and phrases directly extracted from the document.
- sti_keywords -- concepts and their prediction scores.
- topic_probability -- model scores for all of the categories.

Running Your Own Instance

Installation

For most people, the simplest installation entails building the docker image, downloading the models, and running the docker container.

Build Docker Image

First, clone this repository and enter its root. Now, you can build the image with:

docker build -t concept_tagging_api:example .

* Developers should look at the make build command in the Makefile. It has an automated process for tagging the image with useful metadata.

With Local Python

* tested with python:3.7
First, clone this repository and enter its root.
Now, create a virtual environment. For example, using venv:

python -m venv venv
source venv/bin/activate

Now install the requirements with:

make requirements

Downloading Models

Then, you need to download the machine learning models upon which the service relies.

You can find zipped file which contains all of the models here. Now, to get the models in the right place and unzip:

mkdir models
mv <YOUR_ZIPPED_MODELS_NAME>.zip models
cd models
unzip <YOUR_ZIPPED_MODELS_NAME>.zip

Alternatively, the models can also be downloaded from data.nasa.gov where they are named STI Tagging Models. However, they download slower from that location.

Running Service

Using Docker

With the docker image and model files in place, you can now run the service with a simple docker command. In the below command be sure to:

Substitute concept_tagging_api:example for the name of your image.
Substitute $(pwd)/models/10_23_2019 to the path to your models directory.

Substitute 5001 with the port on your local machine from which you wish to access the API.

docker run -it \
-p 5001:5000 \
-v $(pwd)/models/10_23_2019:/home/service/models/experiment \
concept_tagging_api:example

Note that you you may experience permission errors when you start the container. To resolve this issue, set the user and group of your models directory to 999. This is the uid for the user

optional The entrypoint to the docker image is gunicorn, a python WSGI HTTP Server which runs our flask app. You can optionally pass additionally arguments to gunicorn. For example:

docker run -it \
    -p 5001:5000 \
    -v $(pwd)/models/10_23_2019:/home/service/models/experiment \
    concept_tagging_api:example --timeout 9000

See here for more information about design considerations for these gunicorn settings.

Pitfalls & Gotchas to Remeber

If you run this on a cloud service and run an upgrade on everything out of date for security reasons, you may need to run sudo service docker stop and then sudo service docker start to get docker going again. You'll also have to find the docker container that you had last running and restart it.
If you run the docker container as described above, remember to try the URL of your service with the proper port at the end of the URL.

Using Local Python

With the requirements installed and the model files in place, you can now run the service with python locally. In the command below, substitute models/test with the path to your models directory. For example, if you followed the example from With Bucket Access, it will be models/10_23_2019.

export MODELS_DIR=models/test; \
python service/app.py

If you were a part of the legacy concept tagger api development team and need access to test server that's no longer available as of 11/9/23, please email us here.

nasa / concept-tagging-api

readme