Closed pastewka closed 2 years ago
According to this, https://stackoverflow.com/questions/64230098/supporting-non-ascii-characters-in-boto3-put-object-tagging, the issue lies in how the boto3 API communicates with the S3 server. AWS itself allows arbitrary Unicode in metadata (https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html, bottom) in some manner, but in quite a recent discussion (https://github.com/boto/botocore/issues/2552) the boto3 developers were not yet very decided on what to do.
Likeley better to mangle all the names. Here is a full traceback of such an error:
DEBUG:s3transfer.tasks:Exception raised.
Traceback (most recent call last):
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/handlers.py", line 529, in validate_ascii_metadata
value.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\u2013' in position 72: ordinal not in range(128)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/tasks.py", line 126, in __call__
return self._execute_main(kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/tasks.py", line 150, in _execute_main
return_value = self._main(**kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/upload.py", line 692, in _main
client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 648, in _make_api_call
request_dict = self._convert_to_request_dict(
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 694, in _convert_to_request_dict
api_params = self._emit_api_params(
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 723, in _emit_api_params
self.meta.events.emit(
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/handlers.py", line 536, in validate_ascii_metadata
raise ParamValidationError(
botocore.exceptions.ParamValidationError: Parameter validation failed:
Non ascii characters found in S3 metadata for key "handle", value: "material/ucl/Fellous-Sigrist - 2017 - Research Data Management advocacy – what works wel.pdf".
S3 metadata can only contain ASCII characters.
DEBUG:s3transfer.utils:Releasing acquire 0/None
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-13' coro=<SignalHandler._dtool_copy_left_to_right() done, defined at /home/jotelha/git/dtool/dtool-lookup-gui/dtool_lookup_gui/TransferTab.py:207> exception=ParamValidationError('Parameter validation failed:\nNon ascii characters found in S3 metadata for key "handle", value: "material/ucl/Fellous-Sigrist - 2017 - Research Data Management advocacy – what works wel.pdf". \nS3 metadata can only contain ASCII characters. ')>
Traceback (most recent call last):
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/handlers.py", line 529, in validate_ascii_metadata
value.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\u2013' in position 72: ordinal not in range(128)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jotelha/git/dtool/dtool-lookup-gui/dtool_lookup_gui/TransferTab.py", line 211, in _dtool_copy_left_to_right
target_dataset_uri = self._copy_dataset(source_dataset_uri, target_base_uri)
File "/home/jotelha/git/dtool/dtool-lookup-gui/dtool_lookup_gui/TransferTab.py", line 197, in _copy_dataset
dest_uri = copy_func(
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtoolcore/__init__.py", line 291, in copy
_copy_content(dataset, proto_dataset, progressbar)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtoolcore/__init__.py", line 258, in _copy_content
dest_proto_dataset.put_item(src_abspath, relpath)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtoolcore/__init__.py", line 746, in put_item
return self._storage_broker.put_item(fpath, relpath)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtool_s3/storagebroker.py", line 637, in put_item
_put_item_with_retry(
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtool_s3/storagebroker.py", line 139, in _put_item_with_retry
success = _upload_file(s3client, fpath, bucket, dest_path, extra_args)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtool_s3/storagebroker.py", line 108, in _upload_file
s3client.upload_file(
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/boto3/s3/inject.py", line 129, in upload_file
return transfer.upload_file(
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/boto3/s3/transfer.py", line 279, in upload_file
future.result()
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/futures.py", line 106, in result
return self._coordinator.result()
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/futures.py", line 265, in result
raise self._exception
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/tasks.py", line 126, in __call__
return self._execute_main(kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/tasks.py", line 150, in _execute_main
return_value = self._main(**kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/upload.py", line 692, in _main
client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 648, in _make_api_call
request_dict = self._convert_to_request_dict(
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 694, in _convert_to_request_dict
api_params = self._emit_api_params(
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 723, in _emit_api_params
self.meta.events.emit(
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/handlers.py", line 536, in validate_ascii_metadata
raise ParamValidationError(
botocore.exceptions.ParamValidationError: Parameter validation failed:
Non ascii characters found in S3 metadata for key "handle", value: "material/ucl/Fellous-Sigrist - 2017 - Research Data Management advocacy – what works wel.pdf".
S3 metadata can only contain ASCII characters.
Why does dtool s3 put the filename/path in the S3 metadata in the first place? It should already be part of the manifest, which should be able to handle UTF-8 gracefully.
From what I can remember this is due to the proto dataset needing to be able to keep track of file names to be able to write the manifest file when it is frozen into a dataset.
It may not be obvious from the usage of dtool cp, that the new dataset in S3, actually goes through all the stages of dataset creation including that of a proto dataset.
Hopefully that gives some clarity of why the paths are stored as metadata. Now we can start thinking of a solution.
Cc @mrmh2
Okay, this sounds like one should mange all names before sticking them into the S3 metadata. Is there a standardized way to do this? urllib.parse.quote
could do the trick.
I have implemented a fix for this in the branch https://github.com/jic-dtool/dtool-s3/tree/i18n-filename-support
https://github.com/jic-dtool/dtool-s3/commit/c930760327609d9b212bb6f49fc64f730224aabd
It basically converts the relpath (handle) stored in the AWS S3 objects metadata to a base64 encoded version of the string. And later in the function used by the dtool's freeze functionality where that relpath is used to build up the manifest the base64 encoded version is converted back to the actual relpath.
I have done some basic testing of this to ensure that it is backwards compatible. It seems to be okay. However, @pastewka @jotelha it would be great if you could try it out as well as this is a pretty radical change.
However, I think that the only way that this could have an adverse effect is if one had created a proto dataset with an earlier version of dtool and then tried to freeze that dataset with this version of dtool.
@mrmh2 do you have any thoughts on this fix?
Hello @tjelvar-olsson, thanks a lot, this looks great. As so often it took me some time to get back to this. I gave it a manual try (see protocol below, first with previous dtool-s3, then with this branch, the session was recorded with script
and later stripped off all the formatting sequences, some lines may thus look a little awkward) and it worked all fine,
jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ source ~/venv/jlh-imtek-python-3.8/bin/activate
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool query '{}'
Authentication URL https://10.20.49.250:5001/token password:
[
{
"base_uri": "smb://test-share",
"created_at": 1604860720.736,
"creator_username": "jotelha",
"dtoolcore_version": "3.17.0",
"frozen_at": 1643237269.784,
"name": "simple_test_dataset",
"tags": [],
"type": "dataset",
"uri": "smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675",
"uuid": "1a1f9fad-8589-413e-9602-5bbd66bfe675"
},
{
"base_uri": "s3://test-bucket",
"created_at": 1604860720.736,
"creator_username": "jotelha",
"dtoolcore_version": "3.17.0",
"frozen_at": 1643237360.858,
"name": "simple_test_dataset",
"tags": [],
"type": "dataset",
"uri": "s3://test-bucket/1a1f9fad-8589-413e-9602-5bbd66bfe675",
"uuid": "1a1f9fad-8589-413e-9602-5bbd66bfe675"
},
{
"base_uri": "s3://test-bucket",
"created_at": 1637533066.579,
"creator_username": "jotelha",
"dtoolcore_version": "3.18.0",
"frozen_at": 1637533448.927,
"name": "2021-11-17-hoermann-livmats-retreat-rdm-intro",
"tags": [],
"type": "dataset",
"uri": "s3://test-bucket/bbd82391-d21f-4288-b10f-ec3569b8b87f",
"uuid": "bbd82391-d21f-4288-b10f-ec3569b8b87f"
},
{
"base_uri": "smb://test-share",
"created_at": 1643891923.161,
"creator_username": "AzureAD+JohannesH\u00f6rmann",
"dtoolcore_version": "3.18.1",
"frozen_at": 1643892110.92,
"name": "empty-0200",
"tags": [],
"type": "dataset",
"uri": "smb://test-share/5a2c1927-4682-4a9d-910b-a856a8adff4a",
"uuid": "5a2c1927-4682-4a9d-910b-a856a8adff4a"
}
]
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool create test-dataset
Created proto dataset file://jotelha-fujitsu-ubuntu-20/home/jotelha/sandbox/20220208_dtool_s3/test-dataset
Next steps:
1. Add raw data, eg:
dtool add item my_file.txt file://jotelha-fujitsu-ubuntu-20/home/jotelha/sandbox/20220208_dtool_s3/test-dataset
Or use your system commands, e.g:
mv my_data_directory /home/jotelha/sandbox/20220208_dtool_s3/test-dataset/data/
2. Add descriptive metadata, e.g:
dtool readme interactive file://jotelha-fujitsu-ubuntu-20/home/jotelha/sandbox/20220208_dtool_s3/test-dataset
3. Convert the proto dataset into a dataset:
dtool freeze file://jotelha-fujitsu-ubuntu-20/home/jotelha/sandbox/20220208_dtool_s3/test-dataset
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ touch test-dataset/data/test-file
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ touch test-dataset/data/2022-02-08-hörmann-test # put some content
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ cat test-dataset/data/2022-02-08-hörmann-test
Noch eine Datei von Johannes Hörmann
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ ll
total 28
drwxrwxr-x 3 jotelha jotelha 4096 Feb 8 13:29 ./
drwxrwxr-x 96 jotelha jotelha 4096 Feb 8 13:28 ../
drwxrwxr-x 4 jotelha jotelha 4096 Feb 8 13:29 test-dataset/
-rw-rw-r-- 1 jotelha jotelha 16384 Feb 8 13:31 typescript
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool freeze test-dataset/
Generating manifest [####################################] 100% test-file8-hörmann-test
Dataset frozen file://jotelha-fujitsu-ubuntu-20/home/jotelha/sandbox/20220208_dtool_s3/test-dataset
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool ls s3://test-bucket
simple_test_dataset
s3://test-bucket/1a1f9fad-8589-413e-9602-5bbd66bfe675
2021-11-17-hoermann-livmats-retreat-rdm-intro
s3://test-bucket/bbd82391-d21f-4288-b10f-ec3569b8b87f
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool cp test-dataset/s3://test-bucket
Copying dataset [------------------------------------] 0%
Traceback (most recent call last):
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/handlers.py", line 529, in validate_ascii_metadata
value.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 12: ordinal not in range(128)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jotelha/venv/jlh-imtek-python-3.8/bin/dtool", line 8, in <module>
sys.exit(dtool())
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtool_create/dataset.py", line 584, in cp
_copy(resume, quiet, dataset_uri, dest_base_uri)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtool_create/dataset.py", line 547, in _copy
dest_uri = copy_func(
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtoolcore/__init__.py", line 291, in copy
_copy_content(dataset, proto_dataset, progressbar)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtoolcore/__init__.py", line 258, in _copy_content
dest_proto_dataset.put_item(src_abspath, relpath)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtoolcore/__init__.py", line 749, in put_item
return self._storage_broker.put_item(fpath, relpath)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtool_s3/storagebroker.py", line 637, in put_item
_put_item_with_retry(
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtool_s3/storagebroker.py", line 139, in _put_item_with_retry
success = _upload_file(s3client, fpath, bucket, dest_path, extra_args)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/dtool_s3/storagebroker.py", line 108, in _upload_file
s3client.upload_file(
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/boto3/s3/inject.py", line 129, in upload_file
return transfer.upload_file(
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/boto3/s3/transfer.py", line 279, in upload_file
future.result()
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/futures.py", line 106, in result
return self._coordinator.result()
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/futures.py", line 265, in result
raise self._exception
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/tasks.py", line 126, in __call__
return self._execute_main(kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/tasks.py", line 150, in _execute_main
return_value = self._main(**kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/s3transfer/upload.py", line 692, in _main
client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 648, in _make_api_call
request_dict = self._convert_to_request_dict(
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 694, in _convert_to_request_dict
api_params = self._emit_api_params(
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/client.py", line 723, in _emit_api_params
self.meta.events.emit(
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages/botocore/handlers.py", line 536, in validate_ascii_metadata
raise ParamValidationError(
botocore.exceptions.ParamValidationError: Parameter validation failed:
Non ascii characters found in S3 metadata for key "handle", value: "2022-02-08-hörmann-test".
S3 metadata can only contain ASCII characters.
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ pip install git+https://github.com/jic-dtool/dtool-s3.git@i18n-filename-supportt
Collecting git+https://github.com/jic-dtool/dtool-s3.git@i18n-filename-support
Cloning https://github.com/jic-dtool/dtool-s3.git (to revision i18n-filename-support) to /tmp/pip-req-build-cloptb2m
Running command git clone --filter=blob:none -q https://github.com/jic-dtool/dtool-s3.git /tmp/pip-req-build-cloptb2m
Running command git checkout -b i18n-filename-support --track origin/i18n-filename-support
Switched to a new branch 'i18n-filename-support'
Branch 'i18n-filename-support' set up to track remote branch 'i18n-filename-support' from 'origin'.
Resolved https://github.com/jic-dtool/dtool-s3.git to commit c930760327609d9b212bb6f49fc64f730224aabd
Preparing metadata (setup.py) ... done
Requirement already satisfied: click in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from dtool-s3==0.13.0) (8.0.3)
Requirement already satisfied: dtoolcore>=3.17 in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from dtool-s3==0.13.0) (3.18.1)
Requirement already satisfied: dtool_cli in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from dtool-s3==0.13.0) (0.7.1)
Requirement already satisfied: boto3 in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from dtool-s3==0.13.0) (1.16.59)
Requirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from boto3->dtool-s3==0.13.0) (0.3.4)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from boto3->dtool-s3==0.13.0) (0.10.0)
Requirement already satisfied: botocore<1.20.0,>=1.19.59 in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from boto3->dtool-s3==0.13.0) (1.19.59)
Requirement already satisfied: click-plugins in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from dtool_cli->dtool-s3==0.13.0) (1.1.1)
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from botocore<1.20.0,>=1.19.59->boto3->dtool-s3==0.13.0) (1.26.7)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from botocore<1.20.0,>=1.19.59->boto3->dtool-s3==0.13.0) (2.8.1)
Requirement already satisfied: six>=1.5 in /home/jotelha/venv/jlh-imtek-python-3.8/lib/python3.8/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.20.0,>=1.19.59->boto3->dtool-s3==0.13.0) (1.15.0)
Building wheels for collected packages: dtool-s3
Building wheel for dtool-s3 (setup.py) ... done
Created wheel for dtool-s3: filename=dtool_s3-0.13.0-py3-none-any.whl size=11760 sha256=247716693958cfabb585fe11872401b5ae3c9a178b721ad77d2abcc9b2b8156f
Stored in directory: /tmp/pip-ephem-wheel-cache-dfvwyr4h/wheels/85/2b/5f/aeed29316d2e60c180cbc7406c2a075a66d830215f9ae40a5e
Successfully built dtool-s3
Installing collected packages: dtool-s3
Attempting uninstall: dtool-s3
Found existing installation: dtool-s3 0.12.0
Uninstalling dtool-s3-0.12.0:
Successfully uninstalled dtool-s3-0.12.0
Successfully installed dtool-s3-0.13.0
WARNING: You are using pip version 21.3.1; however, version 22.0.2 is available.
You should consider upgrading via the '/home/jotelha/venv/jlh-imtek-python-3.8/bin/python -m pip install --upgrade pip' command.
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool cp test-dataset s3://test-bucket
Usage: dtool cp [OPTIONS] DATASET_URI DEST_BASE_URI
Try 'dtool cp -h' for help.
Error: Dataset already exists: s3://test-bucket/5e3398fc-77d6-43b1-a10e-1d38fc346ac8
(jlh-imtek-python-31@) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws --no-verify-ssl s3 --profile bwcloud-testserver --endpoint https://10.20.49.250:9000 ls s3://test-bucket
PRE 1a1f9fad-8589-413e-9602-5bbd66bfe675/
PRE u/
2022-01-26 23:48:20 0 dtool-1a1f9fad-8589-413e-9602-5bbd66bfe675
2022-02-08 13:32:34 11 dtool-5e3398fc-77d6-43b1-a10e-1d38fc346ac8
2022-02-01 16:12:29 11 dtool-bbd82391-d21f-4288-b10f-ec3569b8b87f
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ ENDPOINT="http://10.20.49.250:9000"
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ BUCKET="test-bucket"
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ AWS_OPTS="--profile bwcloud-testserver --endpoint=${ENDPOINT}"
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/
PRE 1a1f9fad-8589-413e-9602-5bbd66bfe675/
PRE u/
2022-01-26 23:48:20 0 dtool-1a1f9fad-8589-413e-9602-5bbd66bfe675
2022-02-01 16:12:29 11 dtool-bbd82391-d21f-4288-b10f-ec3569b8b87f
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/u/
PRE testuser/
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/u/testuser/
PRE 5e3398fc-77d6-43b1-a10e-1d38fc346ac8/
PRE bbd82391-d21f-4288-b10f-ec3569b8b87f/
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/
2022-02-08 13:32:34 1301 README.txt
2022-02-08 13:32:34 223 dtool
2022-02-08 13:32:34 546 structure.json
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ PREFIX='/u/testuser/'
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ # remove dataset objects
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ for fn in `aws s3 --endpoint=${ENDPOINT} ls --recursive s3://${BUCKET}/${PREFIX}${UUID}/ | awk '{ print $4 }'`; do
> if ! aws s3 --endpoint=${ENDPOINT} rm s3://${BUCKET}/${fn}; then
> echo "Error removing dataset. Do you have the correct write permissions?"
> exit 1
> fi
> done
delete: s3://test-bucket/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/README.txt
delete: s3://test-bucket/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/dtool
delete: s3://test-bucket/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/structure.json
(jlh-imtek-python-3.8) jotelha@jotelha-fuj5e3398fc-77d6-43b1-a10e-1d38fc346ac8/ol_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/
PRE bbd82391-d21f-4288-b10f-ec3569b8b87f/
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtlsl cp test-dataset s3://test-bucket
simple_test_dataset
s3://test-bucket/1a1f9fad-8589-413e-9602-5bbd66bfe675
2021-11-17-hoermann-livmats-retreat-rdm-intro
s3://test-bucket/bbd82391-d21f-4288-b10f-ec3569b8b87f
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool cp test-dataset s3://test-bucket
Generatingtmanifest##[####################################]0%100%sttest-file8-hörmann-test
Dataset copied to:
s3://test-bucket/5e3398fc-77d6-43b1-a10e-1d38fc346ac8
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/
PRE data/
2022-02-08 13:53:40 1301 README.txt
2022-02-08 13:53:40 0 README.yml
2022-02-08 13:53:40 218 dtool
2022-02-08 13:53:40 510 manifest.json
2022-02-08 13:53:40 546 structure.json
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/data
PRE data/
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/data/
2022-02-08 13:53:40 38 24e6dddd88eab8573bf08e5f778ac2b73d2d8f53
2022-02-08 13:53:40 33 c8e410c79a41fec4c7336287b14592e7d9b67b76
dtoolilses3://test-buckettelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ aws s3 ${AWS_OPTS} ls s3://${BUCKET}/u/testuser/5e3398fc-77d6-43b1-a10e-1d38fc346ac8/data/
simple_test_dataset
s3://test-bucket/1a1f9fad-8589-413e-9602-5bbd66bfe675
test-dataset
s3://test-bucket/5e3398fc-77d6-43b1-a10e-1d38fc346ac8
2021-11-17-hoermann-livmats-retreat-rdm-intro
s3://test-bucket/bbd82391-d21f-4288-b10f-ec3569b8b87f
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool ls s3://test-bucket/test-dataset
simple_test_dataset
s3://test-bucket/1a1f9fad-8589-413e-9602-5bbd66bfe675
test-dataset
s3://test-bucket/5e3398fc-77d6-43b1-a10e-1d38fc346ac8
2021-11-17-hoermann-livmats-retreat-rdm-intro
s3://test-bucket/bbd82391-d21f-4288-b10f-ec3569b8b87f
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ dtool ls s3://test-bucket/5e3398fc-77d6-43b1-a10e-1d38fc346ac8
24e6dddd88eab8573bf08e5f778ac2b73d2d8f53 2022-02-08-hörmann-test
c8e410c79a41fec4c7336287b14592e7d9b67b76 test-file
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ mkdir another_place
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3$ cd another_place/
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3/another_place$ dtool cps.3://test-bucket/5e3398fc-77d6-43b1-a10e-1d38fc346ac8
Generatingtmanifest##[####################################]0%100%sttest-file8-hörmann-test
Dataset copied to:
file://jotelha-fujitsu-ubuntu-20/home/jotelha/sandbox/20220208_dtool_s3/another_place/test-dataset
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3/another_place$ ll
total 12
drwxrwxr-x 3 jotelha jotelha 4096 Feb 8 13:55 ./
drwxrwxr-x 4 jotelha jotelha 4096 Feb 8 13:55 ../
drwxrwxr-x 4 jotelha jotelha 4096 Feb 8 13:55 test-dataset/
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3/another_place$ ll test-dataset/
total 16
drwxrwxr-x 4 jotelha jotelha 4096 Feb 8 13:55 ./
drwxrwxr-x 3 jotelha jotelha 4096 Feb 8 13:55 ../
drwxrwxr-x 2 jotelha jotelha 4096 Feb 8 13:55 data/
drwxrwxr-x 5 jotelha jotelha 4096 Feb 8 13:55 .dtool/
-rw-rw-r-- 1 jotelha jotelha 0 Feb 8 13:55 README.yml
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3/another_place$ ll test-dataset/data/
total 16
drwxrwxr-x 2 jotelha jotelha 4096 Feb 8 13:55 ./
drwxrwxr-x 4 jotelha jotelha 4096 Feb 8 13:55 ../
-rw-rw-r-- 1 jotelha jotelha 38 Feb 8 13:55 2022-02-08-hörmann-test
-rw-rw-r-- 1 jotelha jotelha 33 Feb 8 13:55 test-file
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3/another_place$ cat test-dataset/data/2022-02-08-hörmann-test
Noch eine Datei von Johannes Hörmann
(jlh-imtek-python-3.8) jotelha@jotelha-fujitsu-ubuntu-20:~/sandbox/20220208_dtool_s3/another_place$ exit
I will put that in our server instance and see how the indexing works with it, but I wouldn't have any concerns with merging and releasing those changes now.
Works fine on the server, have a look at the current lookup server testing setup https://github.com/livMatS/dtool-lookup-server-container-composition, reachable with configuration https://github.com/livMatS/RDM-Wiki-public/releases/download/v0.4.0/dtool.json, in particular dataset s3://test-bucket/d656d394-6a28-4273-bcfb-d050df31f4a3.
On our NetApp StorageGRID I recently got the error message:
<filename>
here contained the umlaut ä.Not sure if AWS has the same restriction, but it may be good to mangle all names to ensure they have no non-ASCII characters.