Closed valera-rozuvan closed 8 years ago
I agree that the most important thing is that it is all consistent, so either Option 1 or Option 2. My preference is Option 1, but Option 2 is OK too.
Yes, I would like ideally option 1 also! I will try to use utf-8 and \n from now on. However, using Windows, I am not sure that my local Eclipse or Xemacs editor will not add a \r by mistake. Also, for the XML files, the port to utf-8 must be done also, but it will not be so easy, as the current XML parser used supports only 8859-9 and not utf-8.
So it will take some time I think to have all in utf-8. But it's definitively the good direction!
Any decent editor like Eclipse or Xemacs will have a setting to choose line endings mode. But yes you raise a good point that if the XML parser can't handle UTF-8 then it makes things a little harder!
@juliena82 Can you please briefly outline what needs to be done to make the current XML parser support UTF-8?
From an e-mail by: @juliena82 :
- (1) Add UTF-8 support in the XML parser (or use a real XML parser, and not the
“hack” I use since 8 years) so as to support cyrillic characters
- (2) Convert the model.xml, lang.xml and schema.xml files from ISO-8859-1 to
UTF-8 (can be done automatically)
- (3) Update these files to add russian translations.
As I understand correctly, we need to replace the following files:
with some XML parser library that supports UTF-8?
@juliena82 I suggest we use pugixml! What do you think?
@valera-rozuvan Why not, it seems quite lightweight and without dependencies (apparently), which is good. Let me some time to study question and the impact on the code.
@juliena82 I think the best way to handle this would be to just touch the libcutil. What I mean by this, is to keep the same interface (so that we don't also have to modify ocvdemo for example), but change the XML parsing mechanism to use the pugixml library.
So, to reiterate, right now we have for example in file ocvdemo/src/ocvdemo.cc on line 716:
fs_racine = new utils::model::FileSchema("./data/schema.xml");
I am talking about updating libcutil in such a way, that line 716 doesn't have to be changed. The interface we keep the same, only replace the XML parser.
@juliena82 Do you understand what I am trying to explain?
@valera-rozuvan I am not sure to understand. Yes of course, the change is localized in libcutil.
I had already started the pugixml integration. It does not work yet however.
@valera-rozuvan By any chance, do you know a good software to convert a ISO-8859-1 XML file to UTF-8 file? I don't want to code it...
@juliena82 Yes. On Linux you can do:
$ iconv -f iso-8859-1 -t utf8 [filename] > [newfilename]
We can use this command to write a small script that converts all files in a directory recursively. See also Determine and change file character encoding | mindspill.net.
You can leave this task to me, if you don't have a Linux machine near you... But do create an issue about this conversion, when you are done with inclusion of the pugixml library, and assign the issue to me.
Thanks! I seems to work very well (I have not linux, but MINGW/MSYS, and this command as well). Now I have other problems...
@valera-rozuvan Ok, the UTF-8 support almost work. There remain only small things to fix. I will finish tomorrow. Soon, you will have a lot of russian translation to do ;)
Ok now the data files are in UTF-8. See below a screenshot with cyrillic characters (but it's probably nonsense, since I used Google automatic french -> russian translations).
There is still a problem with the title of the result window (not with the title of the main window). The special thing about that is that the result window is an opencv window, not a gtk window. Maybe opencv Windows do not support cyrillic characters? Or more probably there is still a bug...
@valera-rozuvan Now it's up to you to do the russian translations. The files involved are data/schema.xml, data/lang.xml and data/model.xml.
@valera-rozuvan Of course, the translation is a lot work, and not prioritary. But, if you have the time, it would be great to have the demonstrator in the original language of the OpenCV authors!
@juliena82 We can create a normal Gtk window, and draw images to that. Why did you decide to use the OpenCV windows for image output?
Nice work @valera-rozuvan !
No, it's nice work @juliena82 !!! @shervinemami = )
We need to decide at an early stage of this project what to do with character encoding (of source files), and the line endings. I see 3 options:
Option 1
Convert all files to utf-8, and Unix style line endings. Unix uses just line feed ("\n").
Option 2
Convert all files to utf-8, and Windows style line endings. DOS/Windows uses carriage return and line feed ("\r\n") as a line ending.
Option 3
We don't care about this, and have files with different combination of encoding and line endings. If we take this option, we need to be careful about commits. It is very important to not save a file in a different encoding/line ending style, and commit it back to GitHub. If we are not careful, the commit will introduce changes to every line of the file!
I have already bumped into this problem when editing the Makefiles. I am on a Lunix system, and some of the Makefiles were not in UTF-8 and had DOS line endings...
@shervinemami , @juliena82 Please provide your insight on this issue!
When we decide on the direction to take, we must clearly document this. So that other contributors adhere to our guidelines. Sort of coding style = )