whole-tale / girder_wholetale

Girder plugin providing basic Whole Tale functionality
BSD 3-Clause "New" or "Revised" License
3 stars 5 forks source link

OpenICPSR provider #543

Closed Xarthisius closed 2 years ago

Xarthisius commented 2 years ago

This adds support for data stored in OpenICPSR.

Implementation details

Given doi / url pointing to OpenICPSR landing page we download entire project as a zipfile, then ingest raw data and store it in girder.

How to test?

  1. Make a fresh deployment (constants were changed, so unless you want to do it manually...)
  2. Navigate to https://dashboard.local.wholetale.org/settings
  3. Note new provider ICPSR. Click on connect. Select "www.openicpsr.org". Provide you password for icpsr login. Assumption your globus/girder account uses the same email as icpsr entity.
  4. Navigate to: https://dashboard.local.wholetale.org/mine?name=AEAREP-3198-Stata&environment=STATA%20%28Desktop%29&uri=https%3A%2F%2Fwww.openicpsr.org%2Fopenicpsr%2Fproject%2F132081%2Fversion%2FV1%2Fview&asTale=true

Notes

OpenICPSR actually provides urls for individual files and they're potentially scrapeable. That means we could do it slightly better than downloading/uploading, but it would require more careful handling of their sessions / auth.

Known issues

  1. Importing doi a second time will duplicate content (items will have multiple files).
codecov[bot] commented 2 years ago

Codecov Report

Merging #543 (b0e9ed0) into master (5d396bc) will increase coverage by 0.07%. The diff coverage is 94.65%.

@@            Coverage Diff             @@
##           master     #543      +/-   ##
==========================================
+ Coverage   92.72%   92.80%   +0.07%     
==========================================
  Files          58       60       +2     
  Lines        4564     4738     +174     
==========================================
+ Hits         4232     4397     +165     
- Misses        332      341       +9     
Impacted Files Coverage Δ
server/rest/dataset.py 87.31% <75.00%> (-0.79%) :arrow_down:
server/lib/openicpsr/auth.py 93.33% <93.33%> (ø)
server/lib/openicpsr/provider.py 95.19% <95.19%> (ø)
server/constants.py 89.13% <100.00%> (ø)
server/lib/__init__.py 98.11% <100.00%> (+0.05%) :arrow_up:
server/lib/bdbag/bdbag_provider.py 93.90% <100.00%> (ø)
server/lib/dataone/auth.py 82.35% <100.00%> (+1.10%) :arrow_up:
server/lib/import_providers.py 92.30% <100.00%> (+1.13%) :arrow_up:
server/lib/verificator.py 100.00% <100.00%> (ø)
server/rest/account.py 100.00% <100.00%> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5d396bc...b0e9ed0. Read the comment docs.

craig-willis commented 2 years ago

WFM. A couple of minor observations:

craig-willis commented 2 years ago

fwiw, I get the same error as https://github.com/whole-tale/ngx-dashboard/pull/279#pullrequestreview-1026157534 during import if I set an invalid password.

Xarthisius commented 2 years ago

Some updates:

  1. verify() now does something. It even works properly and checks if you're logged in.
  2. We no longer store ICPSR password. Instead we use the password that was provided during POST /account/icpsr/apikey to generate a valid JSESSIONID and store that. To verify: provide your credentials using dashboard/settings, then GET /user/me and confirm that "access_token" is a different string.
  3. Providing wrong password now returns an error message.