jic-dtool / dtool-s3

S3 backend for dtool
MIT License
1 stars 3 forks source link

Mangle non-ASCII characters #14

Closed pastewka closed 2 years ago

pastewka commented 3 years ago

On our NetApp StorageGRID I recently got the error message:

botocore.exceptions.ParamValidationError: Parameter validation failed:
Non ascii characters found in S3 metadata for key "handle", value: "<filename>".  
S3 metadata can only contain ASCII characters. 

<filename> here contained the umlaut ä.

Not sure if AWS has the same restriction, but it may be good to mangle all names to ensure they have no non-ASCII characters.

jotelha commented 2 years ago

According to this, https://stackoverflow.com/questions/64230098/supporting-non-ascii-characters-in-boto3-put-object-tagging, the issue lies in how the boto3 API communicates with the S3 server. AWS itself allows arbitrary Unicode in metadata (https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html, bottom) in some manner, but in quite a recent discussion (https://github.com/boto/botocore/issues/2552) the boto3 developers were not yet very decided on what to do.

Likeley better to mangle all the names. Here is a full traceback of such an error:

DEBUG:s3transfer.tasks:Exception raised.
Traceback (most recent call last):
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/handlers.py", line 529, in validate_ascii_metadata
    value.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\u2013' in position 72: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/tasks.py", line 126, in __call__
    return self._execute_main(kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/tasks.py", line 150, in _execute_main
    return_value = self._main(**kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/upload.py", line 692, in _main
    client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 648, in _make_api_call
    request_dict = self._convert_to_request_dict(
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 694, in _convert_to_request_dict
    api_params = self._emit_api_params(
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 723, in _emit_api_params
    self.meta.events.emit(
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/handlers.py", line 536, in validate_ascii_metadata
    raise ParamValidationError(
botocore.exceptions.ParamValidationError: Parameter validation failed:
Non ascii characters found in S3 metadata for key "handle", value: "material/ucl/Fellous-Sigrist - 2017 - Research Data Management advocacy – what works wel.pdf".  
S3 metadata can only contain ASCII characters. 
DEBUG:s3transfer.utils:Releasing acquire 0/None
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-13' coro=<SignalHandler._dtool_copy_left_to_right() done, defined at /home/jotelha/git/dtool/dtool-lookup-gui/dtool_lookup_gui/TransferTab.py:207> exception=ParamValidationError('Parameter validation failed:\nNon ascii characters found in S3 metadata for key "handle", value: "material/ucl/Fellous-Sigrist - 2017 - Research Data Management advocacy – what works wel.pdf".  \nS3 metadata can only contain ASCII characters. ')>
Traceback (most recent call last):
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/handlers.py", line 529, in validate_ascii_metadata
    value.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\u2013' in position 72: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jotelha/git/dtool/dtool-lookup-gui/dtool_lookup_gui/TransferTab.py", line 211, in _dtool_copy_left_to_right
    target_dataset_uri = self._copy_dataset(source_dataset_uri, target_base_uri)
  File "/home/jotelha/git/dtool/dtool-lookup-gui/dtool_lookup_gui/TransferTab.py", line 197, in _copy_dataset
    dest_uri = copy_func(
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtoolcore/__init__.py", line 291, in copy
    _copy_content(dataset, proto_dataset, progressbar)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtoolcore/__init__.py", line 258, in _copy_content
    dest_proto_dataset.put_item(src_abspath, relpath)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtoolcore/__init__.py", line 746, in put_item
    return self._storage_broker.put_item(fpath, relpath)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtool_s3/storagebroker.py", line 637, in put_item
    _put_item_with_retry(
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtool_s3/storagebroker.py", line 139, in _put_item_with_retry
    success = _upload_file(s3client, fpath, bucket, dest_path, extra_args)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtool_s3/storagebroker.py", line 108, in _upload_file
    s3client.upload_file(
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/boto3/s3/inject.py", line 129, in upload_file
    return transfer.upload_file(
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/boto3/s3/transfer.py", line 279, in upload_file
    future.result()
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/futures.py", line 106, in result
    return self._coordinator.result()
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/futures.py", line 265, in result
    raise self._exception
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/tasks.py", line 126, in __call__
    return self._execute_main(kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/tasks.py", line 150, in _execute_main
    return_value = self._main(**kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/upload.py", line 692, in _main
    client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 648, in _make_api_call
    request_dict = self._convert_to_request_dict(
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 694, in _convert_to_request_dict
    api_params = self._emit_api_params(
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 723, in _emit_api_params
    self.meta.events.emit(
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/handlers.py", line 536, in validate_ascii_metadata
    raise ParamValidationError(
botocore.exceptions.ParamValidationError: Parameter validation failed:
Non ascii characters found in S3 metadata for key "handle", value: "material/ucl/Fellous-Sigrist - 2017 - Research Data Management advocacy – what works wel.pdf".  
S3 metadata can only contain ASCII characters. 
pastewka commented 2 years ago

Why does dtool s3 put the filename/path in the S3 metadata in the first place? It should already be part of the manifest, which should be able to handle UTF-8 gracefully.

tjelvar-olsson commented 2 years ago

From what I can remember this is due to the proto dataset needing to be able to keep track of file names to be able to write the manifest file when it is frozen into a dataset.

It may not be obvious from the usage of dtool cp, that the new dataset in S3, actually goes through all the stages of dataset creation including that of a proto dataset.

Hopefully that gives some clarity of why the paths are stored as metadata. Now we can start thinking of a solution.

Cc @mrmh2

pastewka commented 2 years ago

Okay, this sounds like one should mange all names before sticking them into the S3 metadata. Is there a standardized way to do this? urllib.parse.quote could do the trick.

tjelvar-olsson commented 2 years ago

I have implemented a fix for this in the branch https://github.com/jic-dtool/dtool-s3/tree/i18n-filename-support

https://github.com/jic-dtool/dtool-s3/commit/c930760327609d9b212bb6f49fc64f730224aabd

It basically converts the relpath (handle) stored in the AWS S3 objects metadata to a base64 encoded version of the string. And later in the function used by the dtool's freeze functionality where that relpath is used to build up the manifest the base64 encoded version is converted back to the actual relpath.

I have done some basic testing of this to ensure that it is backwards compatible. It seems to be okay. However, @pastewka @jotelha it would be great if you could try it out as well as this is a pretty radical change.

However, I think that the only way that this could have an adverse effect is if one had created a proto dataset with an earlier version of dtool and then tried to freeze that dataset with this version of dtool.

@mrmh2 do you have any thoughts on this fix?

jotelha commented 2 years ago

Hello @tjelvar-olsson, thanks a lot, this looks great. As so often it took me some time to get back to this. I gave it a manual try (see protocol below, first with previous dtool-s3, then with this branch, the session was recorded with script and later stripped off all the formatting sequences, some lines may thus look a little awkward) and it worked all fine,

jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ source ~/venv/jlh-imtek-python-3.8/bin/activate
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool query '{}'
Authentication URL https://10.20.49.250:5001/token password:
[
  {
    "base_uri": "smb://test-share",
    "created_at": 1604860720.736,
    "creator_username": "jotelha",
    "dtoolcore_version": "3.17.0",
    "frozen_at": 1643237269.784,
    "name": "simple_test_dataset",
    "tags": [],
    "type": "dataset",
    "uri": "smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675",
    "uuid": "1a1f9fad-8589-413e-9602-5bbd66bfe675"
  },
  {
    "base_uri": "s3://test-bucket",
    "created_at": 1604860720.736,
    "creator_username": "jotelha",
    "dtoolcore_version": "3.17.0",
    "frozen_at": 1643237360.858,
    "name": "simple_test_dataset",
    "tags": [],
    "type": "dataset",
    "uri": "s3://test-bucket/1a1f9fad-8589-413e-9602-5bbd66bfe675",
    "uuid": "1a1f9fad-8589-413e-9602-5bbd66bfe675"
  },
  {
    "base_uri": "s3://test-bucket",
    "created_at": 1637533066.579,
    "creator_username": "jotelha",
    "dtoolcore_version": "3.18.0",
    "frozen_at": 1637533448.927,
    "name": "2021-11-17-hoermann-livmats-retreat-rdm-intro",
    "tags": [],
    "type": "dataset",
    "uri": "s3://test-bucket/bbd82391-d21f-4288-b10f-ec3569b8b87f",
    "uuid": "bbd82391-d21f-4288-b10f-ec3569b8b87f"
  },
  {
    "base_uri": "smb://test-share",
    "created_at": 1643891923.161,
    "creator_username": "AzureAD+JohannesH\u00f6rmann",
    "dtoolcore_version": "3.18.1",
    "frozen_at": 1643892110.92,
    "name": "empty-0200",
    "tags": [],
    "type": "dataset",
    "uri": "smb://test-share/5a2c1927-4682-4a9d-910b-a856a8adff4a",
    "uuid": "5a2c1927-4682-4a9d-910b-a856a8adff4a"
  }
]
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool create test-dataset
Created proto dataset file://jotelha-fujitsu-ubuntu-20/home/jotelha/sandbox/20220208_dtool_s3/test-dataset
Next steps:
1. Add raw data, eg:
   dtool add item my_file.txt file://jotelha-fujitsu-ubuntu-20/home/jotelha/sandbox/20220208_dtool_s3/test-dataset
   Or use your system commands, e.g:
   mv my_data_directory /home/jotelha/sandbox/20220208_dtool_s3/test-dataset/data/
2. Add descriptive metadata, e.g:
   dtool readme interactive file://jotelha-fujitsu-ubuntu-20/home/jotelha/sandbox/20220208_dtool_s3/test-dataset
3. Convert the proto dataset into a dataset:
   dtool freeze file://jotelha-fujitsu-ubuntu-20/home/jotelha/sandbox/20220208_dtool_s3/test-dataset
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ touch test-dataset/data/test-file
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ touch test-dataset/data/2022-02-08-hörmann-test # put some content
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ cat test-dataset/data/2022-02-08-hörmann-test
Noch eine Datei von Johannes Hörmann
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ ll
total 28
drwxrwxr-x  3 jotelha jotelha  4096 Feb  8 13:29 ./
drwxrwxr-x 96 jotelha jotelha  4096 Feb  8 13:28 ../
drwxrwxr-x  4 jotelha jotelha  4096 Feb  8 13:29 test-dataset/
-rw-rw-r--  1 jotelha jotelha 16384 Feb  8 13:31 typescript
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool freeze test-dataset/
Generating manifest  [####################################]  100%  test-file8-hörmann-test
Dataset frozen file://jotelha-fujitsu-ubuntu-20/home/jotelha/sandbox/20220208_dtool_s3/test-dataset
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool ls s3://test-bucket
simple_test_dataset
  s3://test-bucket/1a1f9fad-8589-413e-9602-5bbd66bfe675
2021-11-17-hoermann-livmats-retreat-rdm-intro
  s3://test-bucket/bbd82391-d21f-4288-b10f-ec3569b8b87f
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool cp test-dataset/s3://test-bucket
Copying dataset  [------------------------------------]    0%
Traceback (most recent call last):
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/handlers.py", line 529, in validate_ascii_metadata
    value.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 12: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jotelha/venv/jlh-imtek-python-3.8/bin/dtool", line 8, in <module>
    sys.exit(dtool())
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtool_create/dataset.py", line 584, in cp
    _copy(resume, quiet, dataset_uri, dest_base_uri)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtool_create/dataset.py", line 547, in _copy
    dest_uri = copy_func(
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtoolcore/__init__.py", line 291, in copy
    _copy_content(dataset, proto_dataset, progressbar)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtoolcore/__init__.py", line 258, in _copy_content
    dest_proto_dataset.put_item(src_abspath, relpath)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtoolcore/__init__.py", line 749, in put_item
    return self._storage_broker.put_item(fpath, relpath)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtool_s3/storagebroker.py", line 637, in put_item
    _put_item_with_retry(
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtool_s3/storagebroker.py", line 139, in _put_item_with_retry
    success = _upload_file(s3client, fpath, bucket, dest_path, extra_args)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtool_s3/storagebroker.py", line 108, in _upload_file
    s3client.upload_file(
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/boto3/s3/inject.py", line 129, in upload_file
    return transfer.upload_file(
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/boto3/s3/transfer.py", line 279, in upload_file
    future.result()
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/futures.py", line 106, in result
    return self._coordinator.result()
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/futures.py", line 265, in result
    raise self._exception
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/tasks.py", line 126, in __call__
    return self._execute_main(kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/tasks.py", line 150, in _execute_main
    return_value = self._main(**kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/upload.py", line 692, in _main
    client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 648, in _make_api_call
    request_dict = self._convert_to_request_dict(
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 694, in _convert_to_request_dict
    api_params = self._emit_api_params(
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 723, in _emit_api_params
    self.meta.events.emit(
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/handlers.py", line 536, in validate_ascii_metadata
    raise ParamValidationError(
botocore.exceptions.ParamValidationError: Parameter validation failed:
Non ascii characters found in S3 metadata for key "handle", value: "2022-02-08-hörmann-test".
S3 metadata can only contain ASCII characters.
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ pip install git+https://github.com/jic-dtool/dtool-s3.git@i18n-filename-supportt
Collecting git+https://github.com/jic-dtool/dtool-s3.git@i18n-filename-support
  Cloning https://github.com/jic-dtool/dtool-s3.git (to revision i18n-filename-support) to /tmp/pip-req-build-cloptb2m
  Running command git clone --filter=blob:none -q https://github.com/jic-dtool/dtool-s3.git /tmp/pip-req-build-cloptb2m
  Running command git checkout -b i18n-filename-support --track origin/i18n-filename-support
  Switched to a new branch 'i18n-filename-support'
  Branch 'i18n-filename-support' set up to track remote branch 'i18n-filename-support' from 'origin'.
  Resolved https://github.com/jic-dtool/dtool-s3.git to commit c930760327609d9b212bb6f49fc64f730224aabd
  Preparing metadata (setup.py) ... done
Requirement already satisfied: click in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from dtool-s3==0.13.0) (8.0.3)
Requirement already satisfied: dtoolcore>=3.17 in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from dtool-s3==0.13.0) (3.18.1)
Requirement already satisfied: dtool_cli in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from dtool-s3==0.13.0) (0.7.1)
Requirement already satisfied: boto3 in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from dtool-s3==0.13.0) (1.16.59)
Requirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from boto3->dtool-s3==0.13.0) (0.3.4)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from boto3->dtool-s3==0.13.0) (0.10.0)
Requirement already satisfied: botocore<1.20.0,>=1.19.59 in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from boto3->dtool-s3==0.13.0) (1.19.59)
Requirement already satisfied: click-plugins in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from dtool_cli->dtool-s3==0.13.0) (1.1.1)
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from botocore<1.20.0,>=1.19.59->boto3->dtool-s3==0.13.0) (1.26.7)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from botocore<1.20.0,>=1.19.59->boto3->dtool-s3==0.13.0) (2.8.1)
Requirement already satisfied: six>=1.5 in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.20.0,>=1.19.59->boto3->dtool-s3==0.13.0) (1.15.0)
Building wheels for collected packages: dtool-s3
  Building wheel for dtool-s3 (setup.py) ... done
  Created wheel for dtool-s3: filename=dtool_s3-0.13.0-py3-none-any.whl size=11760 sha256=247716693958cfabb585fe11872401b5ae3c9a178b721ad77d2abcc9b2b8156f
  Stored in directory: /tmp/pip-ephem-wheel-cache-dfvwyr4h/wheels/85/2b/5f/aeed29316d2e60c180cbc7406c2a075a66d830215f9ae40a5e
Successfully built dtool-s3
Installing collected packages: dtool-s3
  Attempting uninstall: dtool-s3
    Found existing installation: dtool-s3 0.12.0
    Uninstalling dtool-s3-0.12.0:
      Successfully uninstalled dtool-s3-0.12.0
Successfully installed dtool-s3-0.13.0
WARNING: You are using pip version 21.3.1; however, version 22.0.2 is available.
You should consider upgrading via the '/home/jotelha/venv/jlh-imtek-python-3.8/bin/python -m pip install --upgrade pip' command.
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool cp test-dataset s3://test-bucket
Usage: dtool cp [OPTIONS] DATASET_URI DEST_BASE_URI
Try 'dtool cp -h' for help.

Error: Dataset already exists: s3://test-bucket/5e3398fc-77d6-43b1-a10e-1d38fc346ac8
(jlh-imtek-python-31@) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws --no-verify-ssl s3 --profile bwcloud-testserver --endpoint https://10.20.49.250:9000 ls s3://test-bucket
               PRE 1a1f9fad-8589-413e-9602-5bbd66bfe675/
               PRE u/
2022-01-26 23:48:20      0 dtool-1a1f9fad-8589-413e-9602-5bbd66bfe675
2022-02-08 13:32:34     11 dtool-5e3398fc-77d6-43b1-a10e-1d38fc346ac8
2022-02-01 16:12:29     11 dtool-bbd82391-d21f-4288-b10f-ec3569b8b87f
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ ENDPOINT="http://10.20.49.250:9000"
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ BUCKET="test-bucket"
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ AWS_OPTS="--profile bwcloud-testserver --endpoint=${ENDPOINT}"
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/
               PRE 1a1f9fad-8589-413e-9602-5bbd66bfe675/
               PRE u/
2022-01-26 23:48:20      0 dtool-1a1f9fad-8589-413e-9602-5bbd66bfe675
2022-02-01 16:12:29     11 dtool-bbd82391-d21f-4288-b10f-ec3569b8b87f
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/u/
               PRE testuser/
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/u/testuser/
               PRE 5e3398fc-77d6-43b1-a10e-1d38fc346ac8/
               PRE bbd82391-d21f-4288-b10f-ec3569b8b87f/
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/
2022-02-08 13:32:34   1301 README.txt
2022-02-08 13:32:34    223 dtool
2022-02-08 13:32:34    546 structure.json
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ PREFIX='/u/testuser/'
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ # remove dataset objects
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ for fn in `aws s3 --endpoint=${ENDPOINT} ls --recursive s3://${BUCKET}/${PREFIX}${UUID}/ | awk '{ print $4 }'`; do
>     if ! aws s3 --endpoint=${ENDPOINT} rm s3://${BUCKET}/${fn}; then
>     echo "Error removing dataset. Do you have the correct write permissions?"
>     exit 1
>     fi
> done
delete: s3://test-bucket/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/README.txt
delete: s3://test-bucket/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/dtool
delete: s3://test-bucket/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/structure.json
(jlh-imtek-python-3.8) jotelha@jotelha-fuj5e3398fc-77d6-43b1-a10e-1d38fc346ac8/ol_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/
               PRE bbd82391-d21f-4288-b10f-ec3569b8b87f/
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtlsl cp test-dataset s3://test-bucket
simple_test_dataset
  s3://test-bucket/1a1f9fad-8589-413e-9602-5bbd66bfe675
2021-11-17-hoermann-livmats-retreat-rdm-intro
  s3://test-bucket/bbd82391-d21f-4288-b10f-ec3569b8b87f
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool cp test-dataset s3://test-bucket
Generatingtmanifest##[####################################]0%100%sttest-file8-hörmann-test
Dataset copied to:
s3://test-bucket/5e3398fc-77d6-43b1-a10e-1d38fc346ac8
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/
               PRE data/
2022-02-08 13:53:40   1301 README.txt
2022-02-08 13:53:40      0 README.yml
2022-02-08 13:53:40    218 dtool
2022-02-08 13:53:40    510 manifest.json
2022-02-08 13:53:40    546 structure.json
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/data
               PRE data/
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/data/
2022-02-08 13:53:40     38 24e6dddd88eab8573bf08e5f778ac2b73d2d8f53
2022-02-08 13:53:40     33 c8e410c79a41fec4c7336287b14592e7d9b67b76
dtoolilses3://test-buckettelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/data/
simple_test_dataset
  s3://test-bucket/1a1f9fad-8589-413e-9602-5bbd66bfe675
test-dataset
  s3://test-bucket/5e3398fc-77d6-43b1-a10e-1d38fc346ac8
2021-11-17-hoermann-livmats-retreat-rdm-intro
  s3://test-bucket/bbd82391-d21f-4288-b10f-ec3569b8b87f
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool ls s3://test-bucket/test-dataset
simple_test_dataset
  s3://test-bucket/1a1f9fad-8589-413e-9602-5bbd66bfe675
test-dataset
  s3://test-bucket/5e3398fc-77d6-43b1-a10e-1d38fc346ac8
2021-11-17-hoermann-livmats-retreat-rdm-intro
  s3://test-bucket/bbd82391-d21f-4288-b10f-ec3569b8b87f
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool ls s3://test-bucket/5e3398fc-77d6-43b1-a10e-1d38fc346ac8
24e6dddd88eab8573bf08e5f778ac2b73d2d8f53    2022-02-08-hörmann-test
c8e410c79a41fec4c7336287b14592e7d9b67b76    test-file
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ mkdir another_place
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ cd another_place/
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3/another_place$ dtool cps.3://test-bucket/5e3398fc-77d6-43b1-a10e-1d38fc346ac8
Generatingtmanifest##[####################################]0%100%sttest-file8-hörmann-test
Dataset copied to:
file://jotelha-fujitsu-ubuntu-20/home/jotelha/sandbox/20220208_dtool_s3/another_place/test-dataset
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3/another_place$ ll
total 12
drwxrwxr-x 3 jotelha jotelha 4096 Feb  8 13:55 ./
drwxrwxr-x 4 jotelha jotelha 4096 Feb  8 13:55 ../
drwxrwxr-x 4 jotelha jotelha 4096 Feb  8 13:55 test-dataset/
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3/another_place$ ll test-dataset/
total 16
drwxrwxr-x 4 jotelha jotelha 4096 Feb  8 13:55 ./
drwxrwxr-x 3 jotelha jotelha 4096 Feb  8 13:55 ../
drwxrwxr-x 2 jotelha jotelha 4096 Feb  8 13:55 data/
drwxrwxr-x 5 jotelha jotelha 4096 Feb  8 13:55 .dtool/
-rw-rw-r-- 1 jotelha jotelha    0 Feb  8 13:55 README.yml
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3/another_place$ ll test-dataset/data/
total 16
drwxrwxr-x 2 jotelha jotelha 4096 Feb  8 13:55 ./
drwxrwxr-x 4 jotelha jotelha 4096 Feb  8 13:55 ../
-rw-rw-r-- 1 jotelha jotelha   38 Feb  8 13:55 2022-02-08-hörmann-test
-rw-rw-r-- 1 jotelha jotelha   33 Feb  8 13:55 test-file
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3/another_place$ cat test-dataset/data/2022-02-08-hörmann-test
Noch eine Datei von Johannes Hörmann
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3/another_place$ exit

I will put that in our server instance and see how the indexing works with it, but I wouldn't have any concerns with merging and releasing those changes now.

jotelha commented 2 years ago

Works fine on the server, have a look at the current lookup server testing setup https://github.com/livMatS/dtool-lookup-server-container-composition, reachable with configuration https://github.com/livMatS/RDM-Wiki-public/releases/download/v0.4.0/dtool.json, in particular dataset s3://test-bucket/d656d394-6a28-4273-bcfb-d050df31f4a3.

tjelvar-olsson commented 2 years ago

Fixed in https://pypi.org/project/dtool-s3/0.14.0/