oracle / oci-cli

Command Line Interface for Oracle Cloud Infrastructure
https://cloud.oracle.com/cloud-infrastructure
Other
448 stars 185 forks source link

OCI CLI json output is not unicode as required by spec #871

Open triatic opened 3 days ago

triatic commented 3 days ago

When executing commands such as oci compute instance list, json_decode() in php can fail when decoding the json output. This is because the json output can contain non-ascii characters, and it is not unicode as required by specification.

OCI version 3.50.0 (msi package) Windows 10 version 10.0.19045.5011

NupurGupta3101 commented 3 days ago

CLI outputs do not generate non unicode characters, if you have any such example please share, we will investigate.

triatic commented 3 days ago

ASCII encoding from oci compute instance list. Note, this json output contained non-ascii characters which were not unicode.

C:\>php -r "var_dump(mb_detect_encoding(shell_exec('oci compute instance list --compartment-id ocid1.tenancy.oc1..removed')));"
string(5) "ASCII"
adizohar commented 3 days ago

Can you please share the output of the oci commands or the start of it shows data.., is there any errors or warning ?

{
  "data": {
    "items": [
      {

Which python version do you use ?

triatic commented 3 days ago

I'm using the newest Windows oci msi package downloaded from Github, which bundles Python. The json is formatted correctly, other than the non unicode characters.

The line that breaks things is this:

"processor-description": "3.0 GHz Ampere® Altra™",

Start of output:

C:\>oci compute instance list --compartment-id ocid1.tenancy.oc1..removed
{
  "data": [
    {
      "agent-config": {
... etc
adizohar commented 3 days ago

I asked Python version :) I tried to run and didn't see any non ascii, I will wait for OCI CLI team to respond

triatic commented 3 days ago

I asked Python version :)

Whatever the MSI package installs? I can see python38.dll in the installation directory, and I do not have Python globally installed in Windows.

adizohar commented 3 days ago

Thank you for that

triatic commented 3 days ago

I tried to run and didn't see any non ascii

"3.0 GHz Ampere® Altra™" contains non ASCII characters, the ® and ™ characters. The problem for me is that they are also not produced in unicode by oci as required by json spec.

adizohar commented 3 days ago

Understood, it is the processor type, Nupur, please take it with OCI CLI team "processor-description": "3.0 GHz Ampere® Altra™"

triatic commented 3 days ago

@adizohar just to clarify, are you are saying only ASCII characters should be returned by oci's json output, and the expected fix is to remove ® and ™ from the json output?

adizohar commented 3 days ago

No, I don't believe this is a bug or an issue that needs to be fixed. I have asked the OCI CLI team to take a look. In the meantime, you can filter out the non-ASCII characters before ingesting the JSON, or use the OCI Python SDK to read and handle these characters.

triatic commented 3 days ago

Ok. At the moment I am converting oci's output from ASCII to UTF-8 where the ® and ™ characters are present, which prevents json_decode() from failing.

NupurGupta3101 commented 2 days ago

According to https://thesmsworks.co.uk/unicode-detector ® and ™ are unicode characters.

triatic commented 1 day ago

According to https://thesmsworks.co.uk/unicode-detector ® and ™ are unicode characters.

They can be encoded in unicode. But OCI CLI encodes them in Windows-1252 which is not valid for json: https://en.wikipedia.org/wiki/Windows-1252

NupurGupta3101 commented 1 day ago

Can you please share the output recieved (without any further parsing) from oci-cli when you trigger this command (or via a script). It will be more clear then.

triatic commented 1 day ago

Are you happy for me to edit out unique identifiers from the output?