usnistgov / oar-pdr

The NIST Open Access to Research (OAR) Public Data Repository (PDR) system software
11 stars 10 forks source link

ODD-859: Fix propagation of research themes into NERDm records #134

Closed RayPlante closed 4 years ago

RayPlante commented 4 years ago

This PR addresses ticket ODD-859 which reported that themes that users selected in MIDAS were not appearing as research themes on the landing page; further, many records were showing in the SDP search results as having the research themes as "unspecified". This PR uses an underlying fix applied to oar-metadata (PR#35).

As described in oar-metadata PR#35, themes are handled in a special way in the PDR publishing system because in the original, pre-PDR POD records, full taxonomy terms were not be properly specified: only the lowest (most-specific) term was being set. For example, "Glycomics" would be specified rather than "Bioscience: Glycomics". (This means in the former case, the record would not match a search for "Bioscience".) Special code was put into the PDR publishing code to detect this error and replace the theme term with its fully specified term. However, this code was only applied to records ingest from the original PDL; it was not getting applied to new submissions via MIDAS. This produced a NERDm record with an uncorrected theme property an empty topic property (which is used by the SDP and the landing page). This PR applies the special theme handling to new submissions from MIDAS.

Review and Testing

This fix should be tested using the integration branch of oar-docker. After checking out the integration branch of an existing oar-docker clone, be sure to run git pull to get the latest changes. The fix is demonstrated using the internal "publish" application, so change into the publish directory within the oar-docker repo directory.

Demonstrating the error

I recommend that you first run the publish application without the fix, building it with the latest releases already set in the deployment file:

cd publish
../scripts/localdeploy
../scripts/oarctl local build
../scripts/oarctl local up

Next access the landing page for a built-in MIDAS submission at https://localhost/od/id/0531A570681DB5D3A1EE2F169DD3B8CE1491.

Notice the following:

  1. The landing page does not include any Research Topics display
  2. Inspect the input POD file, mdserver/sample_review/1491/_pod.json; look for the theme property and notice that its value is "Optical physics" (not "Physics: Optical Physics").
  3. Inspect the resulting NERDm record in data/pdr/mdserv/0531A570681DB5D3A1EE2F169DD3B8CE1491/metadata/nerdm.json; notice its theme property also has the incorrect term, "Optical physics" and the topic property is empty (as in []).

Testing the fix

To test the fix, bring down the application and rebuild it with the fix to mdserver:

../scripts/oarctl local down
../scripts/localdeploy mdserver
../scripts/oarctl local build mdserver
../scripts/oarctl local up -p          # Notice -p which clears results from the previous run above.

Load the same landing page (https://localhost/od/id/0531A570681DB5D3A1EE2F169DD3B8CE1491), and notice the following:

  1. The landing page should now display the research topic, "Physics: Optical physics".
  2. Inspect the resulting NERDm record in data/pdr/mdserv/0531A570681DB5D3A1EE2F169DD3B8CE1491/metadata/nerdm.json; notice its theme property now has the correct term, "Physics: Optical physics" and the topic property contains this term, too.
GRG2 commented 4 years ago

Built and confirmed error exists , i.e. there are no research topics appearing in this internal Publish record https://localhost/od/id/0531A570681DB5D3A1EE2F169DD3B8CE1491

Rebuilt with metadata server fix, and validated that both the landing page has the correct Research Topics and also the Nerdm file has both theme and topic.tag showing the correct "Physics: Optical physics" values