joyfullservice / msaccess-vcs-addin

Synchronize your Access Forms, Macros, Modules, Queries, Reports, and more with a version control system.
Other
203 stars 40 forks source link

Non-ASCII characters get distorted after ex- and import #217

Closed cwuensch closed 3 years ago

cwuensch commented 3 years ago

BrokenChars Hi there, thank you for your great add-in! It is extremely more user-friendly than the original version. :)

But there is still a bug with UTF-8 conversion in your add-in (which is not present in the original version):

When I export all objects from a database, and import them into a new DB, all non-ASCII characters get distorted. Even worse: They are all mapped to the same byte-code, which makes it impossible to restore them afterwards!!

The exported files in queries/.bas forms/.bas already contain the damaged chars.

Files in modules/*.bas seem okay.

The (original) branch in https://github.com/msaccess-vcs-integration/msaccess-vcs-integration does preserve special chars correctly.

In the attached screenshots, you can see a comparison of an exported form, left is original repo with correct chars (ANSI, not UTF-8) - right is the distorted export from this fork. Can you help with this, please?

joyfullservice commented 3 years ago

Hi Christian,

Thank you for reporting this! Can you provide some more details on your environment and the versions that you are using? That will help in better understanding the problem.

Also, would you be able to attach a small database example that demonstrates the problem? That will help me to reproduce the issue on my system.

Thank you!

cwuensch commented 3 years ago

Hi Adam,

thank you for the reply! I will shortly answer your questions:

Of course, I can provide a demo database. But I will need a bit of time for this...

I think, the best idea would be to do no conversion at all. Just output the chars as provided by MSAccess. Also, I would rather prefer to omit the Unicode BOM in the resulting files.

joyfullservice commented 3 years ago

Hi Christian,

I am also using Microsoft Access 2010 (32-bit) so that will make it easier for me to test. I do not have Windows 7, but we can probably still test the issue on my Windows 10 computer.

The sample database does not have to be very complicated. Just create a blank database file, and add a form that includes some extended characters that demonstrate the issue. Export to source, then build from source to see if the characters are different from the original.

Regarding the UTF-8 conversion, we have had some pretty involved discussions on this and the clear consensus of the users was that UTF-8 is the universal standard for storing and representing multi-lingual content, and worked best across different version control systems and development tools. (Pretty much everything supports UTF-8) All source files are now saved in UTF-8 BOM format. See #180, #187, #154, #187 for additional reading on this.

The original database should be reconstructed using the original encoding. If it isn't, then this is something we should probably fix.

cwuensch commented 3 years ago

Well, the problem definitely occurs already during export, since the resulting textfiles do replace all special characters with the same byte code. Attached is a little demo database, which shows the problem on my system. Demo.zip

hecon5 commented 3 years ago

Do you happen to have the code page you use on your machine? https://docs.microsoft.com/en-us/windows/win32/intl/code-page-identifiers

cwuensch commented 3 years ago

The current codepage is windows-1252 (which is default for nearly all languages except for Arabic and Asian ones). It is nearly identical with iso-8859-15 resp. iso-8859-1 (latin-1)

joyfullservice commented 3 years ago

Thank you for the sample database. I can reproduce the issue on my system.

image

joyfullservice commented 3 years ago

Thanks to your sample file, I think I have found the issue... The Sanitize routine expects the source file to be exported from Microsoft Access in UTF-8 format. The sample MDB file was using file version 4.0, which is encoded with the system codepage. If I create a new blank database, and import the table and form, they export just fine. (The database version for that file is 14.0)

If I read the export file using the system encoding, it exports and converts the file just fine.

image

This is correctly converted to UTF-8

image

I think the solution here is to use the system encoding when reading the file if the file version is below a certain point. Obviously version 4.0 needs this, but I am not sure technically where the cutoff happens where the export changes to Unicode... I will see if I can find something on this...

joyfullservice commented 3 years ago

I have made an update that I think may resolve this issue when exporting source from older file formats. Can you try the attached version and see if it works correctly in your environment? Version_Control_v3.3.32.zip

cwuensch commented 3 years ago

Thank you very much for addressing this issue! I installed and tested v3.3.32. (The GUI also displayed the new version number) But the bug persists with no difference, here.

Can I do anything to help you with debugging this issue?

PS: Found out, there is not only the "Access 2000" format, but also "Access 2002-2003", "Access 2007" and I think, also "Access 97". Do they have to be treted separately?

joyfullservice commented 3 years ago

@cwuensch - The nice thing about this being a Microsoft Access add-in is that you can actually debug it live. After opening the add-in for the first time in that instance of Microsoft Access, you can see the project loaded in the VBA editor. If you open the modSanitize module, you can add a breakpoint in the SanitizeFile function. This function reads the natively exported form object source file, and where the codepage is determined. If you step through the code (Shift+F8) it should reach the highlighted line. If you then press F8, it should step into the GetSystemEncoding function which is what determines how to read the file contents.

image

I would also be curious to know what the GetSystemEncoding function returns on your system. You can test this by pressing ctl+G to jump to the immediate window, and type ?GetSystemEncoding and hit enter.

image

On my system it returns windows-1252 which reads the file correctly in the ReadFile function.

cwuensch commented 3 years ago

?GetSystemEncoding returns "windows-1252" on my system as well. (As I said, this is the default setting in Windows for nearly all languages)

But... The problem here is: CurrentDb.Version is a String, not a number! The comparison CurrentDb.Version <= 4 returns False. It should be replaced by (Left$(CurrentDb.Version,1) <= 4).

Why? Here are some examples: ?CDbl("4.0") 40 ?CDbl("4,0") 4 ?(CurrentDb.Version <= 4.0) Falsch (=false) ?(CurrentDb.Version <= 4) Falsch ?(CDbl(CurrentDb.Version)<= 4.0) Falsch ?(CDbl(CurrentDb.Version)<= 4) Falsch ?(CurrentDb.Version = "4.0") Wahr (=true) ?(Left$(CurrentDb.Version,1) <= 4) Wahr

Problem is here, that the number <-> String conversins depend on local settings. And in German locale, a decimal point is a comma, instead of a full stop. That's why "4.0" is converted into the number "4" on your system, but into "40" on mine.

joyfullservice commented 3 years ago

Nice catch!

cwuensch commented 3 years ago

Oh no! Sorry, this was a bad idea... My suggestion would now ALWAYS read with system encoding, because "12.0" would be converted to 1...

This way it should work:

Public Function MainVer(ver As String) As Integer
  pos = InStr(ver, ".")
  If pos > 0 Then
    MainVer = CInt(Left$(ver, pos - 1))
  Else
    MainVer = CInt(ver)
  End If
End Function
joyfullservice commented 3 years ago

@cwuensch - Give this version a try... Version_Control_v3.3.33.zip

cwuensch commented 3 years ago

Thank you very much! I can confirm, that v3.3.33 now exports all elements into working UTF-8. Is it necessary to consider the different handling of Unicode in Access DB version 4.0 vs. 12.0 regarding the import as well? (I think you changed this only for export. But maybe this is sufficient?)

May I ask a further question? After you told me, how to debug the Add-In, I am quite curious: Can I also make changes to the code? When I do what you told me, the changes will not be saved after closing MS Access. When I open the add-in-file, it just starts an automatic installation process and closes itself afterwards.

hecon5 commented 3 years ago

@cwuensch, when Access loads an Add-in (such as this one), if you edit the addin within the "opening" Access file, changes will not be saved. This is a double edged sword: it allows easy debugging, and to try things out that might otherwise ruin files. Downside is that once you close the session, it will discard any settings.

This is a really easy way to load "extras" for users and ensure they don't break things for everyone else.

Anyway, to actually answer your question, I suggest the following;

  1. Fork this MS Access Repository into your GitHub account repos. image

  2. Clone your fork to a local repository alongside your other Access dev repos on your machine.

  3. I put mine alongside some other Access repositories I utilize. image

  4. Switch to "Dev" Branch: I actually suggest renaming YOUR "dev" branch to something local to you, especially if you still use some of the legacyVCS tools. I named my local fork of this branch to "dev-addin".

  5. Connect a second remote to the joyfullservice/msaccess-vcs-integration repository. This way you can track progress on the upstream ( joyfullservice/msaccess-vcs-integration) repository. There may be changes you don't want to pull into yours, or you may want customization not present on the upstream (in my environment, we have some particular changes required to ensure proper integration in our security environment and configurations that shouldn't be the default for everyone). image

  6. Go into your new local repo, and launch the Addin directly.

  7. Make changes, use the add-in to export the add-in code, and commit/push/pull request just like any other repository. image

These should probably be made into a wiki page, but they're here for now.

hecon5 commented 3 years ago

Updated the wiki; it's here: Editing and Contributing

cwuensch commented 3 years ago

Thank you @hecon5 for your detailled instructions. But in fact, my question was slightly different: How can I - just on my computer - edit (and save!) the MS Access Add-In file? If I open the Add-In file, it just runs the installation and closes itself. How can I make changes to the file and save them?

hecon5 commented 3 years ago

Aha! image

image

joyfullservice commented 3 years ago

@cwuensch - To make changes to the add-in, you simply close the install form by pressing escape or clicking the X in the top right corner as @hecon5 shows in the screenshot. Just remember that the add-in that you are editing is not the same as the add-in that runs when you click the menu item. After making changes to the add-in, you will want to install it by running the Autoexec macro, or by reopening the database. After reinstalling, the new changes will be available in the installed add-in, where you can test it with other databases.

What I will often do is test small changes in the loaded add-in, just like I described in the debugging process. When I am satisfied that it is working the way I want it to, then I copy those changes over to the working copy of the add-in. Also note that if the application-level add-in is loaded, you will not be able to install the update. The application-level add-in (the installed version) opens when it is first called from the menu item, and stays loaded until that instance of Microsoft Access is closed. The add-in will warn you that it cannot install if the installed add-in is already loaded. This probably sounds confusing, but it makes a lot of sense once you understand what is going on. 😄

As to your question on whether the files need to be converted back to the system encoding before import, this would depend on which file version was created when the database is built from source. Right now, I think the add-in just uses the current default database version, so this would work fine if the default is set to 2007 or newer. If the new database was created in a legacy format, you might have problems importing the UTF-8 encoded characters.

Here is what the option looks like in Microsoft Access 2010:

image

In the Build function, this is where you would specify the database format when creating a new database:

image

If we wanted the rebuilt database version to match the original, we should probably have the code specify the version to match the exported database. This hasn't come up as a need yet, but if someone would find this helpful or important, we could create a new issue to add this functionality.

cwuensch commented 3 years ago

Thank you for all your explanations!

1.) I think, creating a blank database with the user's "Default file format" seems reasonable. Especially, converting an existing database to a new version might be a reasonable use-case for the add-in. So I would not change that.

2.) But, IF the import has to be different, depending on the version of the newly created database, maybe the user's configuration should be considered? (In my case, for example, the default file format is configured as "Access 2000", because there are customers with legacy software. But in a first quick test, the imported seemed to work nevertheless... that feels strange to me)

hecon5 commented 3 years ago

@cwuensch, is this working? If so, we should probably update the released version to include this fix.

joyfullservice commented 3 years ago

I have just pushed some additional changes to the dev branch relating to the file format of the rebuilt database. The FileFormat of the old database is now stored in project.json and the new database is created using this same version. In other words, if you use Access 2010 to export source from a database in the Access 2000 file format, you can now build that database from source and it will be built in the legacy Access 2000 file format, just like the original database.

Although not directly related to the original issue, this was a side issue I was able to work through for the benefit of those that are working with files in older formats. Attached is a copy of the latest dev version in case you are not familiar with building it from source. Version_Control_v3.3.36.zip

cwuensch commented 3 years ago

Hi Adam, to my mind, it still makes more sense, if a newly created database would use the default format configured in MS Access. This way, you could convert an "old" .mdb-file into new .accdb format. Also, you could stick with legacy formats, when you configure Access respectively. But this is your choice, of course.

But could you please do me the favor and compile the previous version v3.3.35 for me? (I did not manage to build this Access add-in file unfortunately...)

hecon5 commented 3 years ago

@cwuensch, I took the liberty of making a new issue for that, as it's an interesting idea, and I'm not sure it would be out of scope, but I think it could use some discussion before we do anything with it.

cwuensch commented 3 years ago

Thank you @hecon5! But I think this would be a very low priority. For me, personally, it would be perfectly fine, if someone could just provide a compiled 3.3.35 add-in version?

joyfullservice commented 3 years ago

@cwuensch - Did the version 3.3.36 have problems building your database? If so, I would love to be able to resolve the issue for the benefit of everyone.

If you need to convert a database from one version to another, you can use the Save & Publish menu item, and select which version to convert the database to.

image

cwuensch commented 3 years ago

I don't think so. (Could not yet try 3.3.36, but will do!) But I would just personally prefer the old logic: That a database will be created in Access' default format.