woodruffr1973 / ardb

Automatically exported from code.google.com/p/ardb
0 stars 1 forks source link

Saved decks with card text containing "ö" characters can not be reopened #33

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Save a deck with cards like "Zöe" (crypt) or "Rötschreck"
 (library).
2. Reopen the deck with "Open deck (xml)"

What is the expected output? What do you see instead?
Expected = File opened
Instead = Error message: "Ardb Error/An error occured while opening..."

What version of the product are you using? On what operating system?
2007-11-01 version (the one in "downloads" here), Windows XP SP2

Please provide any additional information below.
If "ö" characters are manually replaced in the XML file by "o" characters,
the saved file can be open and "ö" cards are correctly displayed.
Obviously, it is a problem of charset when reading the XML file, as writing
is correct (i.e. with "ö" characters).
The saved file's header has attribute 'encoding="UTF-8"': I guess,
ISO-8859-1 should be more appropriate (as it is in the XSL files used by
the program).

I have tryed to change it. With ISO-8859-1, the error message changes:
"Ardb Error/Couldn't find card Z<#incorrect characters#>e"

Original issue reported on code.google.com by jcbonne...@free.fr on 25 Nov 2007 at 10:09

GoogleCodeExporter commented 9 years ago
This is a duplicate issue. See issue 30.

Original comment by graham.r...@gmail.com on 27 Nov 2007 at 8:50

GoogleCodeExporter commented 9 years ago
Sorry to hassle you, but... it is not (a duplicated issue).
As I read it, Issue 30 is about old deck files saved with an older version.
Issue 33 (mine) is about a deck file created and then open with the same version
(i.e. the last version, downloadable here).
The 2 named cards ("Zöe" and "Rötschreck") are strictly identified like this 
(i.e.
with a "¨") in the Crypt and Library Browsers of THIS version (of 1st Nov, 
called
Ardb2.0).

Or I am wrong because "old deck lists" in Issue 30 stands for the whole cards 
DB of
ARDB...
Do you confirm?

Original comment by jcbonne...@free.fr on 28 Nov 2007 at 12:55

GoogleCodeExporter commented 9 years ago
Opps.  Its actually a duplicate of issue 25.
We have solved this by replacing non English characters in card texts with the
closest English version i.e. Zöe becomes Zoe.  The new open deck code then 
replaces
correctly does the replacement for old deck lists.

I'm not 100% with this fix but its better then not having a new version of 
Ardb. 
I've spent a good couple of weeks effort trying to solve the handling of non 
English
characters but it seems very difficult to solve on a Windows.  I'm not sure how 
the
original Ardb got this right when it was X compiled form Linux!

Also Issue 30 really does stand for the cards in the DB as well as the cards in 
the
deck lists.  I have just not updated it.  (Been very busy recently)

Any ideas on how to fix this correctly are appreciated.

Original comment by graham.r...@gmail.com on 28 Nov 2007 at 8:50

GoogleCodeExporter commented 9 years ago
OK, issue 25 indeed.
Concerning the problem, the encoding of the saved files have changed.

On my old version of ARDB, I can only read a deck file saved by the last 
version, if
I first convert ASCII to UTF-8 (conversion done with UltraEdit).

Then I figure saved files are not encoded in UTF-8 with the new version.
I have found this tool that can help to test (and convert text file encoding):
http://www.chilkatsoft.com/ChilkatCharset.asp
Function "VerifyFile(charset As String, filename As String) As Long" might be 
useful:
http://www.chilkatsoft.com/refdoc/xChilkatCharsetRef.html#method016

Original comment by jcbonne...@free.fr on 1 Dec 2007 at 7:46

GoogleCodeExporter commented 9 years ago
Another tool to help testing file encoding: Replace Pioneer
http://www.mind-pioneer.com/services/index.html

Original comment by jcbonne...@free.fr on 1 Dec 2007 at 8:00

GoogleCodeExporter commented 9 years ago
My new code should bypass any encoding issues as it reads the file in.  
Converts all 
characters into the ASCII < 127.  Then feeds the result into the XML parser.  
Can 
you attach some of your files that fail so I can test against them?

Original comment by graham.r...@gmail.com on 1 Dec 2007 at 8:08

GoogleCodeExporter commented 9 years ago
I figured that a better solution is to encode files in UTF-8.
You should find a library to encode file stream as you wish. ASCII alone sounds 
a bit
out of date, and no support on UTF will not help to improve the program with DB 
updates.
Moreover, what about the code that was working in previous versions? You do not 
have it?

Here tests files:
contain Zöe crypt card and Rötschreck library card.
A=Test_ARDBoldVersion.xml saved with old version of ardb-20060912.
B=Test_ARDBnewVersion.xml saved with new version of ardb-20071101.

Results:
ardb-20060912 opens A: OK
ardb-20060912 opens B: error, can not open file
ardb-20071101 opens A: error, open file but can not find cards
ardb-20071101 opens B: error, can not open file

Original comment by jcbonne...@free.fr on 1 Dec 2007 at 10:16

Attachments:

GoogleCodeExporter commented 9 years ago
Please, find here a document of BigBlue that explains UTF-8 programming:
http://ploug.eu.org/doc/l-linuni.pdf

Original comment by jcbonne...@free.fr on 1 Dec 2007 at 10:26

GoogleCodeExporter commented 9 years ago
I understand how UTF-8 works.  The problem I have is trying to get all the open
source libs ardb uses to build on Windows so they work correctly together.  The 
old
Ardb was cross compiled form Linux to Windows.  If someone can work out how to 
cross
compile the source or get it to build on Windows correctly I will add the 
support for
the 100% correct card text.  But for now we are left with this hack.

We will try and fix this in V3.

Original comment by graham.r...@gmail.com on 3 Dec 2007 at 10:03

GoogleCodeExporter commented 9 years ago
I am not an expert of C programming but have you tried to compile with the gcc 
of Cygwin?

Original comment by jcbonne...@free.fr on 3 Dec 2007 at 1:28

GoogleCodeExporter commented 9 years ago
This is still an issue in RC1.
Does anyone have any ideas on how to solve this?

Original comment by graham.r...@gmail.com on 14 Dec 2007 at 8:54

GoogleCodeExporter commented 9 years ago
I have a solution see: http://www1.tip.nl/~t876506/utf8tbl.html#algo
Just need to write the code now.

Original comment by graham.r...@gmail.com on 14 Dec 2007 at 10:49

GoogleCodeExporter commented 9 years ago
Fixed in V2 RC 3

Original comment by graham.r...@gmail.com on 5 Jan 2008 at 4:13