zowe / zowe-explorer-vscode

Visual Studio Code Extension for Zowe, which lets users interact with z/OS Data Sets, Unix System Services, and Jobs on a remote mainframe instance. Powered by Zowe SDKs.
Eclipse Public License 2.0
173 stars 92 forks source link

National (danish/nordic) characters are not translated correctly when observed in VS Code #260

Closed bearou closed 4 years ago

bearou commented 5 years ago

The 3 danish national characters are displayed correctly on 3270, but when seen in VS Code they are represented by #@$ ... What information should be provided to dig into this issue ? Brgds Claus Bjørno

Colin-Stone commented 5 years ago

Hi Claus. Thanks for raising this issue to us. I think what would help us investigate would be a sample file. Is this a dataset member or USS file. If USS is there any information such as tag information, codepages I should make sure we include. Screenshot of how it should look may also be helpful. Many thanks for trying this extension out and helping us make it better.

bearou commented 5 years ago

This is a "normal" zOS dataset (not USS); codepage 277. Where should I post sample screenshots ?

Colin-Stone commented 5 years ago

This is a "normal" zOS dataset (not USS); codepage 277. Where should I post sample screenshots ? Thanks Claus. Can you drop a sreenshot in a comment box please? Tak

Colin-Stone commented 5 years ago

Another thought and apologies if you already have this covered. I notice that VSCode provides alternative local codepage information that can be accessed at the foot of the VSCode app.. image I think the default is UTF-8 but if you click on this number and subsequently clicking the Reopen with encoding option, a dialog is presented which allows you to choose different language options including two Nordic options.

bearou commented 5 years ago

I've tried changing encoding with the following result: The original 3270 ISPF: DL_3270_cp277 Default UTF8 DL_VSC_UTF8 Encoded with nordic 865: DL_VSC_865 Encoded with Nordic 8859-10 DL_VSC_8859-10

Colin-Stone commented 5 years ago

Does this mean that in the Default UTF-8 representation is (apart from the # and [ and being the other way around) is displaying the characters mainly correctly?.

bearou commented 5 years ago

Hi, no... the member shown is the same in 3270/ISPF and VSC. The lines are not switched around. Sorry if my example is a bit confusing. The first 'say' line is the correct Danish line in small and caps. So the Danish letters are translated to $ etc - and $ etc are translated into Danish chars.

Colin-Stone commented 5 years ago

Hi Claus. Can you try retrieving directly from zosmf to see what it returns please? e.g. image

bearou commented 4 years ago

image

And original:

image

bearou commented 4 years ago

Where in this extension can you actually manipulate the code page ?

dkelosky commented 4 years ago

I think we'll be working towards this solution via: https://github.com/zowe/vscode-extension-for-zowe/issues/445

Colin-Stone commented 4 years ago

Where in this extension can you actually manipulate the code page ? Hi Claus.

Zowe Explorer is based upon a zosmf profile within Zowe cli. So my intention with the exercise above was to try and identify if zosmf was also presenting the same issue you are seeing. In my example I am seeing Nordic correctly in Zowe Explorer and zosmf. In your example they are corrupted in zosmf so further investigation should be there.
If it was a USS file I would be describing chtag function which can change an individual file codepage but you are looking at a dataset I must be a codepage associated with your userid or system defaults. Unfortunately this is not an area I am familiar with so will need to find out more information before being to diagnose further.

bearou commented 4 years ago

Have learned that IBM has made an APAR ... hope it helps; however we do not IPL until late March :( PH15263: DATASET AND FILE REST SERVICE SUPPORT REMOTE FUNCTION CROSS SYSPLEX AND ALTERNATE HOST CODE PAGE https://www-01.ibm.com/support/docview.wss?rs=63&uid=isg1PH15263

dkelosky commented 4 years ago

Hi @bearou - sorry I wasn't clear in my message above. If you trace through #445, you'll get to https://github.com/zowe/zowe-cli/issues/632. We have the APAR on our systems and plan to fit this work into the CLI first and then expose it through Zowe Explorer

FALLAI-Denis commented 4 years ago

Hi, I think that the APAR PH15263 is not the right solution to deal with codepage management in MVS files (datasets). It is not up to the Client (ZOWE CLI, ZOWE Explorer) to decide on the codepage used by the z/OS host system to store a file. It is a property that is specific to the z/OS host. The codepage of the z/OS host system should be declared in the z/OSMF Server configuration and the Client (ZOWE CLI, ZOWE Explorer) should only have to declare its own codepage for the conversions to be correctly carried out by z/OSMF.

The case of USS files is different. Either the USS file declares its codepage and it must be used by z/OSMF to carry out the necessary conversions, or the USS file does not declare a specific codepage, and in this case it is the EBCDIC IBM-1047 codepage which must be apply.

The codepage of the z/OS host system must apply to the content of the files, but also to the lists of files and members. In ZOWE Explorer, (and ZOWE CLI), the file and member lists should be sorted according to the classification of the z/OS host system codepage, as they would be under ISPF, (for example according to the classification of the IBM-1147 codepage as far as we are concerned), and not on the coding of the Client (Windows-1252 or UTF-8 or ISO-8859-x). For example in EBCDIC, the characters 0 to 9 are arranged after the letters A to Z.

dkelosky commented 4 years ago

Hi @FALLAI-Denis - thanks for the feedback. Our initial plan is to implement this in a way similar 3270 emulators; that is to allow a client setting of code page.

Would you get any value from this if implemented that way? Do you configure code page in your apps that connect to z/OS today?

Thanks,

FALLAI-Denis commented 4 years ago

Hi,

Reproducing the operating mode of the 3270 emulators could provide an answer, but I think that remains an unsatisfactory solution. This mode of operation causes problems and data corruption when the 3270 emulator is not configured identically by all users of a same z/OS host system. For example, in France we use the codepage IBM-1147 to manage the Euro character (and in the past we used the codepage 297, with a sometimes painful transition). If a user configures his 3270 emulator incorrectly and uses the IBM-1140 (US) codepage for example, this will alter certain values ​​in the files and the user with an IBM-1147 codepage will not see the same thing as the user with the IBM-1140 codepage.

For MVS files ("dataset"), the codepage used for disk storage is implicit or by convention. The mechanism proposed by APAR z/OSMF PH15263 must be kept and used by Clients (ZOWE Explorer, ZOWE CLI) to manage exceptions (for example storage of data encoded in UTF-8 in an MVS file).

For USS files, the Client cannot know the codepage associated with the file for storage on disk, and cannot decide it, because it is a property carried by the USS file system. By default the USS files are stored in IBM-1047, and by exception and explicitly they can be stored in ASCII, or in UTF-8, or any other encoding.

With VS Code / ZOWE Explorer / ZOWE CLI / z/OSMF, we are in a "Client / Server" architecture, and not in a "Passive Terminal" mode. The owner of the data is the z/OS host system. The data Server is z/OSMF. z/OSMF must guarantee the validity of the data for the z/OS host system. Data exchanges between Clients and the Server should always be done independently of any configuration of the Client or Server, and for this UTF-8 coding seems to me to be a good solution because it allows to manage all characters. The z/OSMF Server should provide UTF-8 transformation to or from the implicit coding of the z/OS system for MVS files ("dataset"), and the explicit or default coding IBM-1047 for USS files.

If the Customer wishes to use a local encoding other than UTF-8, then he can locally apply a new UTF-8 transformation to or from the local codepage.

A link chain which would be: Client codepage (example ISO-8859-15) <---> UTF-8 transport <--> Server codepage (example IBM-1147) The translation to UTF-8 is still possible. The translation from UTF-8 to a single byte codepage may not be possible. In this case it is necessary to provide a substitution character, for example the character "?", or else to raise an error and refuse the incoming data.

For information, the z/OSMF service "MVS Explorer" encounters the same problem as ZOWE Explorer and ZOWE CLI: it considers that the data of the z/OS host server is encoded in EBCDIC IBM-1047, which is very rarely the case for MVS files (EBCDIC IBM-1047 is used as the default encoding for USS files). This seems to me to be proof that the problem of translating codepages is the responsibility of the Server (z/OSMF) and not only of the Client.

dkelosky commented 4 years ago

Thanks for providing more info.

It seems to me that client / server applies to 3270 connections, FTP connections, and other connections, not just to the new Zowe tooling. If z/OSMF handled code page settings through a server side configuration, users would still have the option to connect via 3270 or FTP and alter data apart from z/OSMF and its code page setting.

Allowing the client to set a code page accepted for z/OSMF is our first step. For development projects, not all clients need to manually sync on settings, they shared them in "project" settings which are sourced in git (for example). Meaning, every dev working in a project would inherit code page config for that project.

For a broader, z/OS or z/OSMF default code page setting, we could always raise a RFE.

dkelosky commented 4 years ago

@venkatzhub do you have any insight to add here?

venkatzhub commented 4 years ago

@dkelosky - I agree with @FALLAI-Denis that code page is a property/setting on the host, and the right solution would be for zOSMF to allow the setting of that property and make it queryable by the clients, and as you suggested, there should be a RFE opened.

With that said, allowing the client to set a code page is also required, as it serves the purpose of providing an "interim" solution. In addition to that, I have also seen the client side code page setting being used for testing purposes. So there is a legit use case to support that as well.

FALLAI-Denis commented 4 years ago

@dkelosky Hi, While the majority of sources are accessed through Git, some items remain accessed by Zowe Explorer. Example: IBM Z Open Editor for Cobol language uses Zowe Explorer (or RSE) to access COPYBOOKs that most often reside in MVS PDS files on the z/OS system and not in the Git repository of Cobol sources.

dkelosky commented 4 years ago

Hi @FALLAI-Denis - does RSE have a server-side code page configuration?

FALLAI-Denis commented 4 years ago

Hi @FALLAI-Denis - does RSE have a server-side code page configuration?

I don’t know. We don’t use it. I will investigate.

venkatzhub commented 4 years ago

@dkelosky @FALLAI-Denis - from what I remember RSE server inherits the host code page i.e it detects it.

bearou commented 4 years ago

@venkatzhub and others. I dont quite see, why this issue was closed. We still have the problem both using zOSMF and now also with RSE communication. There is no difference using one or the other, and selecting different encoding in VS Code will not show the correct result. I know this is not an issue for US user; but it is for many others. Are there more settings I could try or documentation I could provide, to shine more light over this (show stopping) issue ? Best regards, Claus

FALLAI-Denis commented 4 years ago

Hi, For us (France IBM-1147) the problem is solved with the update of several components:

bearou commented 4 years ago

Upgrading CLI and using --ec 277 it seems to work with Danish chars also. Thanks.

bearou commented 4 years ago

However.... same --ec option does not apply to the RSE connection which is what IBM recommends. How do we manage the encoding on RSE connections ? Very interested in how "RSE server inherits the host code page" - if this is the case.