whole-tale / girder_wholetale

Girder plugin providing basic Whole Tale functionality
BSD 3-Clause "New" or "Revised" License
3 stars 5 forks source link

Importing D1 dataset fails with I/O error #530

Open craig-willis opened 2 years ago

craig-willis commented 2 years ago

Encountered while testing v1.1rc1 (https://github.com/whole-tale/wt-design-docs/issues/166). This repeats for me on both test and local.

Test steps: From test case "Import from DataONE: READ-WRITE":

  1. Navigate to https://girder.local.wholetale.org/api/v1/integration/dataone?uri=https%3A%2F%2Fsearch.dataone.org%2Fview%2Fdoi%3A10.18739%2FA2VQ2S94D&title=Fire%20influences%20on%20forest%20recovery%20and%20associated%20climate%20feedbacks%20in%20Siberian%20Larch%20Forests%2C%20Russia&environment=RStudio
  2. Confirm that the Tale title matches the dataset
  3. Select READ/WRITE
  4. Click Create New Tale

Expected results: Dataset is imported into workspace, tale is successfully created

Actual results:

Traceback (most recent call last):
  File "/girder/girder/events.py", line 164, in run
    event = trigger(eventName, info, _async=True, daemon=True)
  File "/girder/girder/events.py", line 314, in trigger
    handler['handler'](e)
  File "/girder/plugins/jobs/server/__init__.py", line 43, in scheduleLocal
    fn(job)
  File "/girder/plugins/wholetale/server/tasks/import_binder.py", line 196, in run
    copy_fs(source_fs, destination_fs)
  File "/girder/venv/lib/python3.9/site-packages/fs/copy.py", line 48, in copy_fs
    return copy_fs_if(
  File "/girder/venv/lib/python3.9/site-packages/fs/copy.py", line 108, in copy_fs_if
    return copy_dir_if(
  File "/girder/venv/lib/python3.9/site-packages/fs/copy.py", line 448, in copy_dir_if
    copier.copy(_src_fs, dir_path, _dst_fs, copy_path)
  File "/girder/venv/lib/python3.9/site-packages/fs/_bulk.py", line 142, in copy
    copy_file_internal(
  File "/girder/venv/lib/python3.9/site-packages/fs/copy.py", line 279, in copy_file_internal
    _copy_locked()
  File "/girder/venv/lib/python3.9/site-packages/fs/copy.py", line 269, in _copy_locked
    with src_fs.openbin(src_path) as read_file:
  File "/girder/plugins/wholetale/server/tasks/import_binder.py", line 363, in openbin
    return open(fdict["path"], "r+b")
  File "/girder/venv/lib/python3.9/site-packages/fs/error_tools.py", line 89, in __exit__
    reraise(fserror, fserror(self._path, exc=exc_value), traceback)
  File "/girder/venv/lib/python3.9/site-packages/six.py", line 718, in reraise
    raise value.with_traceback(tb)
  File "/girder/plugins/wholetale/server/tasks/import_binder.py", line 362, in openbin
    self._fs._ensure_region_available(path, fdict, fd, 0, fdict["obj"]["size"])
  File "/girder/venv/lib/python3.9/site-packages/girderfs/dms.py", line 226, in _ensure_region_available
    self._wait_for_file(fdict)
  File "/girder/venv/lib/python3.9/site-packages/girderfs/dms.py", line 263, in _wait_for_file
    raise OSError(EIO, os.strerror(EIO))
fs.errors.OperationFailed: operation failed, [Errno 5] Input/output error
Xarthisius commented 2 years ago

Not our fault. DataONE CN claims that some of the files are checksummed using MD5, but in reality they were checksummed using SHA1. EIO is raised due to a mismatched hash.

How to reproduce (outside of WT)?

#!/bin/bash

rm -rf tree_cores.csv
echo "CN claims that urn:uuid:1dad942b-e6ec-480c-82c3-9a3c87f67fa5 (tree_cores.csv) has"
curl -s "https://cn.dataone.org/cn/v2/query/solr/?q=identifier:%22urn%3Auuid%3A1dad942b-e6ec-480c-82c3-9a3c87f67fa5%22&fl=identifier,formatType,title,size,formatId,fileName,documents,checksum,checksumAlgorithm&rows=1000&start=0&wt=json" | jq . | grep '"checksum'

echo "Downloading urn:uuid:1dad942b-e6ec-480c-82c3-9a3c87f67fa5"
curl -s -LJO https://cn.dataone.org/cn/v2/resolve/urn:uuid:1dad942b-e6ec-480c-82c3-9a3c87f67fa5
echo "I'm checking md5 sum of tree_cores.csv"
md5sum tree_cores.csv
echo "<sad trombone/>"
echo "But..."
sha1sum tree_cores.csv