tsdconseil / opencv-demonstrator

OpenCV demonstrator (GUI)
http://www.tsdconseil.fr/log/opencv/demo/index-en.html
GNU Lesser General Public License v3.0
141 stars 57 forks source link

Source code file character encoding and line endings #5

Closed valera-rozuvan closed 8 years ago

valera-rozuvan commented 8 years ago

We need to decide at an early stage of this project what to do with character encoding (of source files), and the line endings. I see 3 options:

Option 1

Convert all files to utf-8, and Unix style line endings. Unix uses just line feed ("\n").

Option 2

Convert all files to utf-8, and Windows style line endings. DOS/Windows uses carriage return and line feed ("\r\n") as a line ending.

Option 3

We don't care about this, and have files with different combination of encoding and line endings. If we take this option, we need to be careful about commits. It is very important to not save a file in a different encoding/line ending style, and commit it back to GitHub. If we are not careful, the commit will introduce changes to every line of the file!


I have already bumped into this problem when editing the Makefiles. I am on a Lunix system, and some of the Makefiles were not in UTF-8 and had DOS line endings...

@shervinemami , @juliena82 Please provide your insight on this issue!

When we decide on the direction to take, we must clearly document this. So that other contributors adhere to our guidelines. Sort of coding style = )

shervinemami commented 8 years ago

I agree that the most important thing is that it is all consistent, so either Option 1 or Option 2. My preference is Option 1, but Option 2 is OK too.

juliena82 commented 8 years ago

Yes, I would like ideally option 1 also! I will try to use utf-8 and \n from now on. However, using Windows, I am not sure that my local Eclipse or Xemacs editor will not add a \r by mistake. Also, for the XML files, the port to utf-8 must be done also, but it will not be so easy, as the current XML parser used supports only 8859-9 and not utf-8.

So it will take some time I think to have all in utf-8. But it's definitively the good direction!

shervinemami commented 8 years ago

Any decent editor like Eclipse or Xemacs will have a setting to choose line endings mode. But yes you raise a good point that if the XML parser can't handle UTF-8 then it makes things a little harder!

valera-rozuvan commented 8 years ago

@juliena82 Can you please briefly outline what needs to be done to make the current XML parser support UTF-8?

valera-rozuvan commented 8 years ago

From an e-mail by: @juliena82 :

- (1) Add UTF-8 support in the XML parser (or use a real XML parser, and not the
“hack” I use since 8 years) so as to support cyrillic characters
- (2) Convert the model.xml, lang.xml and schema.xml files from ISO-8859-1 to
UTF-8 (can be done automatically)
- (3) Update these files to add russian translations.
valera-rozuvan commented 8 years ago

As I understand correctly, we need to replace the following files:

with some XML parser library that supports UTF-8?

valera-rozuvan commented 8 years ago

@juliena82 I suggest we use pugixml! What do you think?

juliena82 commented 8 years ago

@valera-rozuvan Why not, it seems quite lightweight and without dependencies (apparently), which is good. Let me some time to study question and the impact on the code.

valera-rozuvan commented 8 years ago

@juliena82 I think the best way to handle this would be to just touch the libcutil. What I mean by this, is to keep the same interface (so that we don't also have to modify ocvdemo for example), but change the XML parsing mechanism to use the pugixml library.

So, to reiterate, right now we have for example in file ocvdemo/src/ocvdemo.cc on line 716:

fs_racine = new utils::model::FileSchema("./data/schema.xml");

I am talking about updating libcutil in such a way, that line 716 doesn't have to be changed. The interface we keep the same, only replace the XML parser.

@juliena82 Do you understand what I am trying to explain?

juliena82 commented 8 years ago

@valera-rozuvan I am not sure to understand. Yes of course, the change is localized in libcutil.

juliena82 commented 8 years ago

I had already started the pugixml integration. It does not work yet however.

juliena82 commented 8 years ago

@valera-rozuvan By any chance, do you know a good software to convert a ISO-8859-1 XML file to UTF-8 file? I don't want to code it...

valera-rozuvan commented 8 years ago

@juliena82 Yes. On Linux you can do:

$ iconv -f iso-8859-1 -t utf8 [filename] > [newfilename]

We can use this command to write a small script that converts all files in a directory recursively. See also Determine and change file character encoding | mindspill.net.

You can leave this task to me, if you don't have a Linux machine near you... But do create an issue about this conversion, when you are done with inclusion of the pugixml library, and assign the issue to me.

juliena82 commented 8 years ago

Thanks! I seems to work very well (I have not linux, but MINGW/MSYS, and this command as well). Now I have other problems...

juliena82 commented 8 years ago

@valera-rozuvan Ok, the UTF-8 support almost work. There remain only small things to fix. I will finish tomorrow. Soon, you will have a lot of russian translation to do ;)

juliena82 commented 8 years ago

Ok now the data files are in UTF-8. See below a screenshot with cyrillic characters (but it's probably nonsense, since I used Google automatic french -> russian translations).

There is still a problem with the title of the result window (not with the title of the main window). The special thing about that is that the result window is an opencv window, not a gtk window. Maybe opencv Windows do not support cyrillic characters? Or more probably there is still a bug...

capture

@valera-rozuvan Now it's up to you to do the russian translations. The files involved are data/schema.xml, data/lang.xml and data/model.xml.

juliena82 commented 8 years ago

@valera-rozuvan Of course, the translation is a lot work, and not prioritary. But, if you have the time, it would be great to have the demonstrator in the original language of the OpenCV authors!

valera-rozuvan commented 8 years ago

@juliena82 We can create a normal Gtk window, and draw images to that. Why did you decide to use the OpenCV windows for image output?

shervinemami commented 8 years ago

Nice work @valera-rozuvan !

valera-rozuvan commented 8 years ago

No, it's nice work @juliena82 !!! @shervinemami = )