okigan / awscurl

curl-like access to AWS resources with AWS Signature Version 4 request signing.
MIT License
755 stars 94 forks source link

awscurl doesn't work for AWS Elasticsearch when query contains CJK multi-byte Unicode characters #106

Open ToshihikoMakita opened 3 years ago

ToshihikoMakita commented 3 years ago

It works fine if domain allows open-access

I'm confident that my query works fine if Elasticsearch domain allows open-access and I use curl command to issue query.

PS C:\Users\toshi\OneDrive\Documents\ElasticSearch\command-2021> curl -XGET "https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty"  -H "Content-Type: application/json" -d "@search-search-ngram-and-kuromoji-2.json"
{
  "took" : 163,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 144.35374,
    "hits" : [
      {
        "_index" : "search-ngram-and-kuromoji",       
        "_type" : "_doc",
        "_id" : "ucNKC3gBNKbjtYmSAtQp",
        "_score" : 144.35374,
        "_source" : {
          "section_url" : "001.html#topic_i5x_lkz_bgb"
        },
        "highlight" : {
          "section_text" : [
            "突や衝突に近い状態(<em>SRSエアバッグの作動および路上障害物との接触</em>など)が発生した時に車"
          ],
          "section" : [
            "イベントデータレコーダー"
          ]
        }
      }
    ]
  }
}

I attached the search query: search-search-ngram-and-kuromoji-2.json zipped. search-search-ngram-and-kuromoji-2.zip

If I launch awscurl without specifying --data-binary, no search result returned.

I have changed domain access policy to exhibit open access and allow one IAM role named ESFullAccess. Also ESFullAccess has trusted IAM user called ESProgram.

The command-line Windows Power shell awsescurl.ps1 zipped.

awsescurl.zip

PS C:\Users\toshi\OneDrive\Documents\ElasticSearch\command-2021> ./awsescurl.ps1 -X GET -d "@search-search-ngram-and-kuromoji-2.json" "https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty"

{'access_key': 'ASIAUPPPMJZZ2PAQ6B74',
 'data': '@search-search-ngram-and-kuromoji-2.json',
 'data_binary': False,
 'header': ['Content-Type: application/json'],
 'include': False,
 'insecure': True,
 'profile': 'default',
 'region': 'yyyyyyyyy',
 'request': 'GET',
 'secret_key': 'IpYWCqt3OScsBxA0/dOmVrSFWN7NfEbX1VyQwye9',
 'security_token': 'FwoGZXIvYXdzEMr//////////wEaDKOZeknTfG/fc5Ng+iKwAXTv+jazLkF0NMNGPiSYtytG3WqA1U1cUCU4ElfcHNixm+LFTOphsYQh9iY7xFO9cBh+iRrvF6qB10IeG7Ta+PJtcLZnzOUfOGE8w6a94YqpWciIRQ5CEAL3UDeYNru0IGeulJxVSzHaTRs8crJ7d3DOqSRDVGSKXfNpCQjzOKXwr/nam3JAkPGyyd4u2B8iWhOmPl9lhxORchF5fBb84Npw8YlSGDFPoLSsBM+NjX8jKJLTuoIGMi0eGpiRN7Wo1OKjCQy2C1V5UNVr6u66Q/cY7r0RveYwNeZhaz4DI/ThXpMPD4A=',
 'service': 'es',
 'session_token': None,
 'uri': 'https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty',
 'verbose': True}
'pretty='
('\n'
 'CANONICAL REQUEST = GET\n'
 '/search-ngram-and-kuromoji/_search\n'
 'pretty=\n'
 'host:search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com\n'
 'x-amz-date:20210315T002554Z\n'
 'x-amz-security-token:FwoGZXIvYXdzEMr//////////wEaDKOZeknTfG/fc5Ng+iKwAXTv+jazLkF0NMNGPiSYtytG3WqA1U1cUCU4ElfcHNixm+LFTOphsYQh9iY7xFO9cBh+iRrvF6qB10IeG7Ta+PJtcLZnzOUfOGE8w6a94YqpWciIRQ5CEAL3UDeYNru0IGeulJxVSzHaTRs8crJ7d3DOqSRDVGSKXfNpCQjzOKXwr/nam3JAkPGyyd4u2B8iWhOmPl9lhxORchF5fBb84Npw8YlSGDFPoLSsBM+NjX8jKJLTuoIGMi0eGpiRN7Wo1OKjCQy2C1V5UNVr6u66Q/cY7r0RveYwNeZhaz4DI/ThXpMPD4A=\n'
 '\n'
 'host;x-amz-date;x-amz-security-token\n'
 '41bb10889ba70ce26a2cda05a6d33b4d057c9caef53d6986f252093450167211')
('\n'
 'STRING_TO_SIGN = AWS4-HMAC-SHA256\n'
 '20210315T002554Z\n'
 '20210315/yyyyyyyyy/es/aws4_request\n'
 '09176940c4429ffeab8230044aa84447ff1f6d91a76b84e0b0069495e2538a75')
'\nHEADERS++++++++++++++++++++++++++++++++++++'
{'Authorization': 'AWS4-HMAC-SHA256 '
                  'Credential=ASIAUPPPMJZZ2PAQ6B74/20210315/yyyyyyyyy/es/aws4_request, '
                  'SignedHeaders=host;x-amz-date;x-amz-security-token, '
                  'Signature=66a1ec2e970b6d5d76628f4e5493d3c6d8b6edc3a6cabd1f54b868290ae81418',
 'Content-Type': 'application/json',
 'x-amz-content-sha256': '41bb10889ba70ce26a2cda05a6d33b4d057c9caef53d6986f252093450167211',
 'x-amz-date': '20210315T002554Z',
 'x-amz-security-token': 'FwoGZXIvYXdzEMr//////////wEaDKOZeknTfG/fc5Ng+iKwAXTv+jazLkF0NMNGPiSYtytG3WqA1U1cUCU4ElfcHNixm+LFTOphsYQh9iY7xFO9cBh+iRrvF6qB10IeG7Ta+PJtcLZnzOUfOGE8w6a94YqpWciIRQ5CEAL3UDeYNru0IGeulJxVSzHaTRs8crJ7d3DOqSRDVGSKXfNpCQjzOKXwr/nam3JAkPGyyd4u2B8iWhOmPl9lhxORchF5fBb84Npw8YlSGDFPoLSsBM+NjX8jKJLTuoIGMi0eGpiRN7Wo1OKjCQy2C1V5UNVr6u66Q/cY7r0RveYwNeZhaz4DI/ThXpMPD4A='}
'\nBEGIN REQUEST++++++++++++++++++++++++++++++++++++'
('Request URL = '
 'https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty')
'\nRESPONSE++++++++++++++++++++++++++++++++++++'
'Response code: 200\n'
{'Date': 'Mon, 15 Mar 2021 00:25:55 GMT', 'Content-Type': 'application/json; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'close', 'Access-Control-Allow-Origin': '*', 'Content-Encoding': 'gzip', 'Vary': 'Accept-Encoding, User-Agent'}

{
  "took" : 47,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

If I launch awscurl with specifying --data-binary, following error occurs.

The command-line Windows Power shell awsescurl-db.ps1 zipped.

awsescurl-db.zip

PS C:\Users\toshi\OneDrive\Documents\ElasticSearch\command-2021> ./awsescurl-db.ps1 -X GET -d "@search-search-ngram-and-kuromoji-2.json" "https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty"

{'access_key': 'ASIAUPPPMJZZUXODBZUO',
 'data': '@search-search-ngram-and-kuromoji-2.json',
 'data_binary': True,
 'header': ['Content-Type: application/json'],
 'include': False,
 'insecure': True,
 'profile': 'default',
 'region': 'yyyyyyyyy',
 'request': 'GET',
 'secret_key': 'kfGOQpb/p4Ckm4vYMocotYu56K15BI+NIQq3VZei',
 'security_token': 'FwoGZXIvYXdzEMr//////////wEaDJnWC1Gw7ePjDa6+fCKwASSe7Ro/gxR1BN7sts/Kn+RfFBJmYmjRZS+mvLClqjvg2OU/MdnZxJQeCsvxIP9Vzk5Ogdmq0tvp3uib3qVDQoEFHAfIf/AIPB/dRyp7KGLQpEXjjw5/v4/KRx3KDipAbH4aMFGjW0KrGymKsGiH1VMsuqfagWAl34tgclAtxoTQO+XGuxJTBafv1MEM9yAnMgN8S2zHtEEwNLOdJyb5CAQgTAAAJEgjSpjUgTpDsa4mKNLZuoIGMi0E3Ok7VjMeYV8WcpmwSwtMetlgSMIwvRRPArr8AYQVoEPINIzz6ufDE6IIquo=',
 'service': 'es',
 'session_token': None,
 'uri': 'https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty',
 'verbose': True}
'pretty='
Traceback (most recent call last):
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\toshi\AppData\Local\Programs\Python\Python39\Scripts\awscurl.exe\__main__.py", line 7, in <module>
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\awscurl.py", line 500, in main
    inner_main(sys.argv[1:])
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\awscurl.py", line 478, in inner_main
    response = make_request(args.request,
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\awscurl.py", line 100, in make_request
    canonical_request, payload_hash, signed_headers = task_1_create_a_canonical_request(
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\awscurl.py", line 200, in task_1_create_a_canonical_request
    payload_hash = sha256_hash_for_binary_data(data) if data_binary else sha256_hash(data)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\utils.py", line 20, in sha256_hash_for_binary_data
    return hashlib.sha256(val).hexdigest()
TypeError: Unicode-objects must be encoded before hashing

Fixing TypeError

I rarely know Python, but from error message "TypeError: Unicode-objects must be encoded before hashing", I've modified following code from:

https://github.com/okigan/awscurl/blob/master/awscurl/awscurl.py line 200

payload_hash = sha256_hash_for_binary_data(data) if data_binary else sha256_hash(data)

to

payload_hash = sha256_hash(data)

because hashing payload should be done with encoding UTF-8 if --data-binary is specified or not.

However modified awscurl still reports following error:

PS C:\Users\toshi\OneDrive\Documents\ElasticSearch\command-2021> ./awsescurl-db.ps1 -X GET -d "@search-search-ngram-and-kuromoji-2.json" "https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty"

{'access_key': 'ASIAUPPPMJZZ2XEUBLF7',
 'data': '@search-search-ngram-and-kuromoji-2.json',
 'data_binary': True,
 'header': ['Content-Type: application/json'],
 'include': False,
 'insecure': False,
 'profile': 'default',
 'region': 'yyyyyyyyy',
 'request': 'GET',
 'secret_key': 'd9Fd2UQbWWuQSo1YzJeGU74pmi+ERNvI4gN7RDre',
 'security_token': 'FwoGZXIvYXdzEMr//////////wEaDIGnfgGrBvi2QJemgiKwAWOZo7D/VAIvgtVf5gL+z+yF610K45iNCG7q6HHf+vIpxFJfiji+uIEJZXQMWOHTkVONfMvm5dBz8g3Ss8aTVQxjEkXTP3tw1MUPiq15qLiYW6ZeRvv9+kw6gkM2r2TIZm1k3oGOknzz8GTwQQoHySjj+zaDqNdHxN1l/rXMcyCdsaghuH12FvNsAmZV0TelhGJ3ceo9X6omS8BqRHCO5YYhs4DV2ApRI80yCsCOAofxKPTguoIGMi1nDJqy/cmXak2pL0VHU2puG5pbnjSV5kgaVD3oerY2bcbCzDu+t8ml+TplHXI=',
 'service': 'es',
 'session_token': None,
 'uri': 'https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty',
 'verbose': True}
'pretty='
('\n'
 'CANONICAL REQUEST = GET\n'
 '/search-ngram-and-kuromoji/_search\n'
 'pretty=\n'
 'host:search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com\n'
 'x-amz-date:20210315T005516Z\n'
 'x-amz-security-token:FwoGZXIvYXdzEMr//////////wEaDIGnfgGrBvi2QJemgiKwAWOZo7D/VAIvgtVf5gL+z+yF610K45iNCG7q6HHf+vIpxFJfiji+uIEJZXQMWOHTkVONfMvm5dBz8g3Ss8aTVQxjEkXTP3tw1MUPiq15qLiYW6ZeRvv9+kw6gkM2r2TIZm1k3oGOknzz8GTwQQoHySjj+zaDqNdHxN1l/rXMcyCdsaghuH12FvNsAmZV0TelhGJ3ceo9X6omS8BqRHCO5YYhs4DV2ApRI80yCsCOAofxKPTguoIGMi1nDJqy/cmXak2pL0VHU2puG5pbnjSV5kgaVD3oerY2bcbCzDu+t8ml+TplHXI=\n'
 '\n'
 'host;x-amz-date;x-amz-security-token\n'
 '41bb10889ba70ce26a2cda05a6d33b4d057c9caef53d6986f252093450167211')
('\n'
 'STRING_TO_SIGN = AWS4-HMAC-SHA256\n'
 '20210315T005516Z\n'
 '20210315/yyyyyyyyy/es/aws4_request\n'
 '21e16f8efede645709fa4ad0d9b0ca272cf17e4679ab9e75048a315d01ba1d45')
'\nHEADERS++++++++++++++++++++++++++++++++++++'
{'Authorization': 'AWS4-HMAC-SHA256 '
                  'Credential=ASIAUPPPMJZZ2XEUBLF7/20210315/yyyyyyyyy/es/aws4_request, '
                  'SignedHeaders=host;x-amz-date;x-amz-security-token, '
                  'Signature=30d36e27a4962fd248cd58052403e1da72d077c214acd5d67ab984d425927c63',
 'Content-Type': 'application/json',
 'x-amz-content-sha256': '41bb10889ba70ce26a2cda05a6d33b4d057c9caef53d6986f252093450167211',
 'x-amz-date': '20210315T005516Z',
 'x-amz-security-token': 'FwoGZXIvYXdzEMr//////////wEaDIGnfgGrBvi2QJemgiKwAWOZo7D/VAIvgtVf5gL+z+yF610K45iNCG7q6HHf+vIpxFJfiji+uIEJZXQMWOHTkVONfMvm5dBz8g3Ss8aTVQxjEkXTP3tw1MUPiq15qLiYW6ZeRvv9+kw6gkM2r2TIZm1k3oGOknzz8GTwQQoHySjj+zaDqNdHxN1l/rXMcyCdsaghuH12FvNsAmZV0TelhGJ3ceo9X6omS8BqRHCO5YYhs4DV2ApRI80yCsCOAofxKPTguoIGMi1nDJqy/cmXak2pL0VHU2puG5pbnjSV5kgaVD3oerY2bcbCzDu+t8ml+TplHXI='}
'\nBEGIN REQUEST++++++++++++++++++++++++++++++++++++'
('Request URL = '
 'https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty')
Traceback (most recent call last):
  File "C:\Users\toshi\AppData\Local\Programs\Python\Python39\Scripts\awscurl-script.py", line 33, in <module>
    sys.exit(load_entry_point('awscurl==0.21', 'console_scripts', 'awscurl')())
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\awscurl.py", line 499, in main
    inner_main(sys.argv[1:])
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\awscurl.py", line 477, in inner_main
    response = make_request(args.request,
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\awscurl.py", line 135, in make_request
    return __send_request(uri, data, headers, method, verify)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\awscurl\awscurl.py", line 330, in __send_request
    response = requests.request(method, uri, headers=headers, data=data, verify=verify)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\requests\api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\requests\sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\requests\sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\requests\adapters.py", line 439, in send
    resp = conn.urlopen(
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\urllib3\connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\urllib3\connectionpool.py", line 394, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\site-packages\urllib3\connection.py", line 234, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\http\client.py", line 1255, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\http\client.py", line 1300, in _send_request
    body = _encode(body, 'body')
  File "c:\users\toshi\appdata\local\programs\python\python39\lib\http\client.py", line 164, in _encode
    raise UnicodeEncodeError(
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 322-357: Body ('繧ィ繧「繝舌ャ繧ー縺ョ菴懷虚縺翫h縺ウ) is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

Last challenge: fixing "UnicodeEncodeError"

According to the error message, I modified the following code:

https://github.com/okigan/awscurl/blob/master/awscurl/awscurl.py line 135

    if data_binary:
        return __send_request(uri, data, headers, method, verify)
    else:
        return __send_request(uri, data.encode('utf-8'), headers, method, verify)

to:

    if data_binary:
        return __send_request(uri, data.encode('utf-8'), headers, method, verify)
    else:
        return __send_request(uri, data.encode('utf-8'), headers, method, verify)

The error message has been vanished, but the query returns nothing.

PS C:\Users\toshi\OneDrive\Documents\ElasticSearch\command-2021> ./awsescurl-db.ps1 -X GET -d "@search-search-ngram-and-kuromoji-2.json" "https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty"

{'access_key': 'ASIAUPPPMJZZ5EN6WTOX',
 'data': '@search-search-ngram-and-kuromoji-2.json',
 'data_binary': True,
 'header': ['Content-Type: application/json'],
 'include': False,
 'insecure': False,
 'profile': 'default',
 'region': 'yyyyyyyyy',
 'request': 'GET',
 'secret_key': 'F3kemHCI+/UNHAE19ms59sBi9XHcMaMN4vpb14as',
 'security_token': 'FwoGZXIvYXdzEMv//////////wEaDPJUgKt3g2RxHCHWGiKwAaY60g/Lm1Hp48nED39tUP/34Ia2tqUT/Ljgqe2Rg2SBOGhAlTvQKpkypyNyAS8+vLFEmRGfw11UM6UOZvFmx3NeWi8g6zpV7QpeSCPFRKbwZLnSxTEn2r7n9p3QRXNNkNWlJSPCLDzTZecORr46FGYmdnX6ZkKL97p6dWpTiQ53O62FarMbUID90zReTEDKMEbM5n2oaS9hcfZCz8M7Zr7+zXC8C+5Pw01fi0TKGnYUKL7muoIGMi3hmp5Jw1uw3fZK4azfs7WUg+/EV2N5qN81sQyrF/sxH+X59xJ2cBjfeAYJvjQ=',
 'service': 'es',
 'session_token': None,
 'uri': 'https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty',
 'verbose': True}
'pretty='
('\n'
 'CANONICAL REQUEST = GET\n'
 '/search-ngram-and-kuromoji/_search\n'
 'pretty=\n'
 'host:search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com\n'
 'x-amz-date:20210315T010710Z\n'
 'x-amz-security-token:FwoGZXIvYXdzEMv//////////wEaDPJUgKt3g2RxHCHWGiKwAaY60g/Lm1Hp48nED39tUP/34Ia2tqUT/Ljgqe2Rg2SBOGhAlTvQKpkypyNyAS8+vLFEmRGfw11UM6UOZvFmx3NeWi8g6zpV7QpeSCPFRKbwZLnSxTEn2r7n9p3QRXNNkNWlJSPCLDzTZecORr46FGYmdnX6ZkKL97p6dWpTiQ53O62FarMbUID90zReTEDKMEbM5n2oaS9hcfZCz8M7Zr7+zXC8C+5Pw01fi0TKGnYUKL7muoIGMi3hmp5Jw1uw3fZK4azfs7WUg+/EV2N5qN81sQyrF/sxH+X59xJ2cBjfeAYJvjQ=\n'
 '\n'
 'host;x-amz-date;x-amz-security-token\n'
 '41bb10889ba70ce26a2cda05a6d33b4d057c9caef53d6986f252093450167211')
('\n'
 'STRING_TO_SIGN = AWS4-HMAC-SHA256\n'
 '20210315T010710Z\n'
 '20210315/yyyyyyyyy/es/aws4_request\n'
 '0a5cea16f924e59ed1d26254b5239ff26a8dee5ee5583c6b51d6922f0eefb46e')
'\nHEADERS++++++++++++++++++++++++++++++++++++'
{'Authorization': 'AWS4-HMAC-SHA256 '
                  'Credential=ASIAUPPPMJZZ5EN6WTOX/20210315/yyyyyyyyy/es/aws4_request, '
                  'SignedHeaders=host;x-amz-date;x-amz-security-token, '
                  'Signature=d4243cc6a1264c77bc90743bca2e03cebe5a408b25b1bf238d8a638bea7bda9b',
 'Content-Type': 'application/json',
 'x-amz-content-sha256': '41bb10889ba70ce26a2cda05a6d33b4d057c9caef53d6986f252093450167211',
 'x-amz-date': '20210315T010710Z',
 'x-amz-security-token': 'FwoGZXIvYXdzEMv//////////wEaDPJUgKt3g2RxHCHWGiKwAaY60g/Lm1Hp48nED39tUP/34Ia2tqUT/Ljgqe2Rg2SBOGhAlTvQKpkypyNyAS8+vLFEmRGfw11UM6UOZvFmx3NeWi8g6zpV7QpeSCPFRKbwZLnSxTEn2r7n9p3QRXNNkNWlJSPCLDzTZecORr46FGYmdnX6ZkKL97p6dWpTiQ53O62FarMbUID90zReTEDKMEbM5n2oaS9hcfZCz8M7Zr7+zXC8C+5Pw01fi0TKGnYUKL7muoIGMi3hmp5Jw1uw3fZK4azfs7WUg+/EV2N5qN81sQyrF/sxH+X59xJ2cBjfeAYJvjQ='}
'\nBEGIN REQUEST++++++++++++++++++++++++++++++++++++'
('Request URL = '
 'https://search-tmtest-xxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyy.es.amazonaws.com/search-ngram-and-kuromoji/_search?pretty')
'\nRESPONSE++++++++++++++++++++++++++++++++++++'
'Response code: 200\n'
{'Date': 'Mon, 15 Mar 2021 01:07:12 GMT', 'Content-Type': 'application/json; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'close', 'Access-Control-Allow-Origin': '*', 'Content-Encoding': 'gzip', 'Vary': 'Accept-Encoding, User-Agent'}

{
  "took" : 68,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

Do you have any ideas to fix this problem?

Regards,

okigan commented 3 years ago

Wow, what an issue report -- 5⭐.

unicode and python seems to be a unique kind of pandora box.

to the effect the the query comes back empty can you audit what is the query received by the ES and does it match to the one received when curl (when using open access)?

ToshihikoMakita commented 3 years ago

OK. I will investigate.

ToshihikoMakita commented 3 years ago

Hi, when Elasticsearch domain allows open-access, I could capture the JSON data by using curl on Ubuntu and Wireshark. The main point is to set SSLKEYLOGFILE environment variables before launching curl.

https://everything.curl.dev/usingcurl/tls/sslkeylogfile

See attached query-json.txt.

query-json.txt

Does awscurl support SSLKEYLOGFILE environment variable? If it is supported, I can send you the JSON dump file.

okigan commented 3 years ago

I think verbose flag "-v" will print the request before it's sent.

On Wed, Mar 17, 2021 at 8:01 PM Toshihiko Makita @.***> wrote:

Hi, when Elasticsearch domain allows open-access, I could capture the JSON data by using curl on Ubuntu and Wireshark. The main point is to set SSLKEYLOGFILE environment variables before launching curl.

See attached query-json.txt.

query-json.txt https://github.com/okigan/awscurl/files/6161008/query-json.txt

Does awscurl support SSLKEYLOGFILE environment variable? If it is supported, I can send you the JSON dump file.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/okigan/awscurl/issues/106#issuecomment-801580155, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADUYXWHEBBLZ4PKEVRK573TEFUG3ANCNFSM4ZFTAWGQ .

ToshihikoMakita commented 3 years ago

Unfortunately "-v" option does not display the contents of JSON specified "-d" option. I got the suggestion from AWS technical support to use Cloud Watch to debug the query that sent to Elasticsearch Service. Here I attach several pattern of the test results.

Results using curl when the ES domain is open

Command-line: search-ngram-and-kuromoji-open-access-curl-cmd.txt CloudWatch log: search-ngram-and-kuromoji-open-access-curl-cloud-watch.txt

Results using awscurl when the ES domain access needs IAM role (without specifying -data-binary)

Command-line: search-ngram-and-kuromoji-iam-role-access-awscurl-no-data-binary-cmd.txt CloudWatch log: search-ngram-and-kuromoji-iam-role-access-awscurl-no-data-binary-cloud-watch.txt

Results using awscurl when the ES domain access needs IAM role (specifying -data-binary⇒on above Last challenge: fixing "UnicodeEncodeError")

Command-line: search-ngram-and-kuromoji-iam-role-access-awscurl-data-binary-cmd-fail.txt CloudWatch log: search-ngram-and-kuromoji-iam-role-access-awscurl-data-binary-cloud-watch-fail.txt

From this result, I found that the JSON file specified -d parameter should be treated as UTF-8 encoded when -data-binary is specified. I have added several changes to awscurl.py and now it works fine.

Command-line: search-ngram-and-kuromoji-iam-role-access-awscurl-data-binary-cmd-success.txt

CloudWatch log: search-ngram-and-kuromoji-iam-role-access-awscurl-data-binary-cloud-watch-success.txt

I will submit a pull-request with this fix. But this pull-request will not compatible with #90 because this one seems to use true binary data that should be uploaded to S3.

Please take a look at my pull-request and consider how to handle both UTF-8 encoded JSON and real binary data with awscurl -d parameter.

Regards,

rdegraaf commented 8 months ago

I had a related issue (trying to send a request that contains 0xff, which is an invalid byte in any UTF-8 sequence). I think that I fixed it by making this change to awscurl.py (line 490 in the current head):

Original:

with open(filename, "r") as post_data_file:

New:

with open(filename, "rb" if args.data_binary else "r") as post_data_file