rustyoldrake / WatsonR

An R Package to process unstructured data with IBM Watson Developer Cloud Services
https://www.ibm.com/watson/developercloud/services-catalog.html
14 stars 9 forks source link

Cannot decode getURL response in Japanese #9

Open na-k-cs opened 7 years ago

na-k-cs commented 7 years ago

Hi Seems my classifier is working but as the query and training data is all in Japanese, the output is jammed like below in Rstudio and not sure how to decode this back, where I need your any insight or help. How could I work around this?

> getURL(paste(base_url,classifier_id,"/classify","?text=", URLencode("虫刺され"),sep=""),userpwd = username_password) [1] "{\n \"classifier_id\" : \"90e7b4x199-nlc-51402\",\n \"url\" : \"https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/90e7b4x199-nlc-51402\",\n \"text\" : \"è\u0099«å\u0088ºã\u0081\u0095ã\u0082\u008c\",\n \"top_class\" : \"足é\u0083¨ç\u0097\u009b\",\n \"classes\" : [ {\n \"class_name\" : \"足é\u0083¨ç\u0097\u009b\",\n \"confidence\" : 0.1536393452103029\n }, {\n \"class_name\" : \"å¿\u0083è\u0087\u0093å\u0086\u0085è¡\u0080æ \u0093\",\n \"confidence\" : 0.11660534985189555\n }, {\n \"class_name\" : \"è¿\u0091è¦\u0096æ\u0080§è\u0084\u0088絡è\u0086\u009cè¡\u0080管æ\u0096°ç\u0094\u009f\",\n \"confidence\" : 0.055206512518980205\n }, {\n \"class_name\" : \"STä¸\u008aæ\u0098\u0087\",\n \"confidence\" : 0.047005338759615996\n }, {\n \"class_name\" : \"骨æ\u0089\u008bè¡\u0093\",\n \"confidence\" : 0.04218856452795845\n }, {\n \"class_name\" : \"è\u0085¹æ°´\",\n \"confidence\" : 0.0344572998536417\n }, {\n ... <truncated>

Thanks,

na-k-cs commented 7 years ago

To add, using Terminal on Mac and curl commands, this coding issue never happens. But the reason that I still wish to use watsonR, I have a test.csv to classify where hundreds of test cases are prepared.

JoeDumoulin commented 7 years ago

This is likely an encoding issue. Can you tell me what is the encoding of your input files? On May 9, 2017 12:25 PM, "na-k-cs" notifications@github.com wrote:

To add, using Terminal on Mac and curl commands, this coding issue never happens. But the reason that I still wish to use watsonR, I have a test.csv to classify where hundreds of test cases are prepared.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rustyoldrake/WatsonR/issues/9#issuecomment-300275035, or mute the thread https://github.com/notifications/unsubscribe-auth/ACT2EHX4Ng6V8RQB56E3iOgKYP-JyIdSks5r4L2YgaJpZM4NVuab .

na-k-cs commented 7 years ago

Hi Joe, it's UTF-8.

JoeDumoulin commented 7 years ago

Do you happen to have an example that I can use to repro this? I will take a look this evening and see what's up. On May 9, 2017 1:07 PM, "na-k-cs" notifications@github.com wrote:

Hi Joe, it's UTF-8.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rustyoldrake/WatsonR/issues/9#issuecomment-300285996, or mute the thread https://github.com/notifications/unsubscribe-auth/ACT2EMmH57hPvIOkYuApA3HMhoGGUzsOks5r4MeYgaJpZM4NVuab .

na-k-cs commented 7 years ago

just sent a file access to you on email address. Thx for taking a look.

JoeDumoulin commented 7 years ago

I think I have fixed this issue. I will package up a a pull request but I would appreciate it if you could test against my fork of this project.

I've added a language selection to the watson.nlc.createnewclassifier call and I have added a bunch of explicit encoding to the various REST calls in the system.

If you test, I would recommend that you create a new classifier first. Im not positive that it is necessary but I think its better to start clean.

Tell me how it goes!

Best, Joe D

On Tue, May 9, 2017 at 1:24 PM, na-k-cs notifications@github.com wrote:

just sent a file access to you on email address. Thx for taking a look.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rustyoldrake/WatsonR/issues/9#issuecomment-300290461, or mute the thread https://github.com/notifications/unsubscribe-auth/ACT2EIh6lCsemBv_6h2Se5EXg94yrG_Rks5r4Mt_gaJpZM4NVuab .

JoeDumoulin commented 7 years ago

We're you able to make this work? I want to know so I can close this issue or fix any remaining problem. Thank you! On May 10, 2017 2:09 PM, "joe dumoulin" joe.dumoulin@gmail.com wrote:

I think I have fixed this issue. I will package up a a pull request but I would appreciate it if you could test against my fork of this project.

I've added a language selection to the watson.nlc.createnewclassifier call and I have added a bunch of explicit encoding to the various REST calls in the system.

If you test, I would recommend that you create a new classifier first. Im not positive that it is necessary but I think its better to start clean.

Tell me how it goes!

Best, Joe D

On Tue, May 9, 2017 at 1:24 PM, na-k-cs notifications@github.com wrote:

just sent a file access to you on email address. Thx for taking a look.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rustyoldrake/WatsonR/issues/9#issuecomment-300290461, or mute the thread https://github.com/notifications/unsubscribe-auth/ACT2EIh6lCsemBv_6h2Se5EXg94yrG_Rks5r4Mt_gaJpZM4NVuab .