watson-developer-cloud / python-sdk

:snake: Client library to use the IBM Watson services in Python and available in pip as watson-developer-cloud
https://pypi.org/project/ibm-watson/
Apache License 2.0
1.46k stars 827 forks source link

Garbled characters when upload a file with Japanese characters in its name to Discovery #571

Closed thetime1102 closed 3 years ago

thetime1102 commented 6 years ago

Expected behavior

Upload a file with Japanese characters in its name ex: before upload: 日本語.json after upload: 日本語.json

Actual behavior

Uploaded file name have garbled characters ex: before upload: 日本語.json after upload: ����.json

Steps to reproduce the problem

Using discovery upload document method

Code snippet (Note: Do not paste your credentials)

import os
import json
from watson_developer_cloud import DiscoveryV1

discovery = DiscoveryV1(
    version="2018-03-05",
    username="username",
    password="password"
)

with open(os.path.join(os.getcwd(), 'path_element', '日本語.json')) as fileinfo:
    add_doc = discovery.add_document('environment_id', 'collection_id', file=fileinfo)
print(json.dumps(add_doc, indent=2))

python sdk version

watson-developer-cloud==1.7.0

python version

3.6.6
germanattanasio commented 6 years ago

Thanks a lot for filing this issue! We'll triage and take a look at it as soon as possible! Do you have a file that we can use for testing?

thetime1102 commented 6 years ago

Thanks a lot for filing this issue! We'll triage and take a look at it as soon as possible! Do you have a file that we can use for testing?

this is file use for test. thanks word ファイル サンプル_doc.zip

germanattanasio commented 6 years ago

@thetime1102 Can you try updating the SDK to 2.0.1?

thetime1102 commented 6 years ago

@germanattanasio I tried and nothing more change.

When I debug request function in SDK and edit the ../Programs/Python/Python36/Lib/site-packages/urllib3/fields.py file

line 38  (before edit): result.encode('ascii')
line 38  (after edit): result.encode('utf8')

The issue is fixed.

germanattanasio commented 6 years ago

@ehdsouza It looks like we are encoding something with ascii rather than utf8 and that's creating problems with some symbols. Can you please take a look?

germanattanasio commented 6 years ago

This could be related https://github.com/requests/requests/issues/4218#issuecomment-392316627

thetime1102 commented 6 years ago

Thanks for help. I'm looking forward to the SDK fixed version

ehdsouza commented 6 years ago

After looking into the details, it looks like the file is encoded properly using RFC 2231 standards.

I created an issue internally for the discovery team to look into the server side.

germanattanasio commented 5 years ago

@ehdsouza can you follow up with the discovery team to see if they fixed this issue?

ehdsouza commented 5 years ago

This is still not fixed from the service side.

germanattanasio commented 4 years ago

We are still waiting for @watson-developer-cloud/watson-discovery to fix the issue @thetime1102. Sorry for the delay.

apaparazzi0329 commented 3 years ago

This issue may still be present although new testing is necessary to ensure this is resolved. I will conduct an investigation next week and report findings here

apaparazzi0329 commented 3 years ago

This issue has been tested to be resolved in the latest version of the python-sdk (5.2.0) and the latest api version of Discovery V1 (2019-04-30). Closing as resolved

thetime1102 commented 3 years ago

This issue has been tested to be resolved in the latest version of the python-sdk (5.2.0) and the latest api version of Discovery V1 (2019-04-30). Closing as resolved

OMG! Thank you so much!!!!