wsharba / opendatakit

Automatically exported from code.google.com/p/opendatakit
0 stars 1 forks source link

Unable to submit non-latin utf8 text to ODK Aggregate #357

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Enter non-latin text into a form field in ODKCollect - I'm submitting 
Ethiopian (Amharic) text
2. when I try to view what has been submitted in ODK Aggregate I only see 
question marks
3.

What is the expected output? What do you see instead?
I should be able to view the text that was submitted

What version of the product are you using? On what operating system?
I'm using ODK Collect (1.1.7) and ODK Aggregate RC1 running on (self hosted) 
Ubuntu 10.04 with MySQL database.

Please provide any additional information below.
This was previously working in ODK Aggregate 1 alpha v3 - though not sure about 
beta. Also I'm sure it's not an issue with my browser being unable to display 
the font, as I can view Amharic font on other websites. 
If I download a dump of the database to view in MySQL browser tool, there I 
also see question marks - so seems it may be a problem with the the text being 
saved to the database, rather than after it's selected from the db to display 
on the ODK Aggregate page, though I can't tell what is being sent from the ODK 
Collect.

Original issue reported on code.google.com by a...@alexlittle.net on 5 Oct 2011 at 5:24

GoogleCodeExporter commented 9 years ago
The MySQL layer hasn't changed for UTF-8 treatment for quite some time; all 
string fields are created with CHARACTER SET utf8

If you are able, please use a browser (the one on the phone should work) and 
browse to ODK Aggregate's data upload page (in RC1, it is on the FormsList 
subtab; in RC2, it is on the SubmissionAdmin subtab) to upload a submission. 

This would help identify whether it is a regression in ODK Collect 1.1.7 (are 
you using RC2 or RC1) or a regression in Aggregate. 

Original comment by mitchellsundt@gmail.com on 6 Oct 2011 at 6:15

GoogleCodeExporter commented 9 years ago

Original comment by mitchellsundt@gmail.com on 6 Oct 2011 at 6:16

GoogleCodeExporter commented 9 years ago
I confirmed that uploading using the website's upload seems to preserve UTF-8 
characters.  Looks like an ODK Collect regression.

Also, when I tried to upload a file to my ODK Aggregate, I got this exception 
and a completely incorrect error code (500) reported back to the user:

/System.err( 2319): java.lang.NullPointerException
/System.err( 2319):    at 
org.odk.collect.android.tasks.InstanceUploaderTask.doInBackground(InstanceUpload
erTask.java:212)
/System.err( 2319):    at 
org.odk.collect.android.tasks.InstanceUploaderTask.doInBackground(InstanceUpload
erTask.java:1)
/System.err( 2319):    at android.os.AsyncTask$2.call(AsyncTask.java:185)
/System.err( 2319):    at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:306)

Original comment by mitchellsundt@gmail.com on 6 Oct 2011 at 6:55

GoogleCodeExporter commented 9 years ago
Adding Aggregate tag, as it is still unclear how to reproduce the actual issue. 
 Upload to opendatakit.appspot.com works.

Original comment by mitchellsundt@gmail.com on 6 Oct 2011 at 6:58

GoogleCodeExporter commented 9 years ago
Alex -- please also confirm that the file saved on the Android shows UTF-8 
characters when you view it in a UTF-8 savvy editor.

Original comment by mitchellsundt@gmail.com on 6 Oct 2011 at 6:59

GoogleCodeExporter commented 9 years ago
Hi,
Just been having another look at this. I created fresh installs of Aggregate 
RC2 on both appspot and a tomcat/mysql version.
When I submit a form with either Amharic or Cyrillic data in the form fields, 
on the appspot version these display fine when I view in ODK Aggregate. However 
with the tomcat/mysql version I still only see the question marks.
I made a dump of the mysql database and this does show the the fields/tables 
etc are created with utf8 character encoding.
But it seems that it's not an issue with ODKCollect - as I was using the same 
install of ODKCollect to submit to each of the Aggregate instances.
Hope that helps you recreate the issue. Please let me know if there is 
something else I should check.
Cheers,
Alex

Original comment by AlextLit...@gmail.com on 10 Oct 2011 at 2:39

GoogleCodeExporter commented 9 years ago
I also tried submitting the form through a manual submission... see the 
attached form and 2 submission files (one Amharic and one Cyrillic)... on my 
appspot these both upload fine, but I still get the question marks with the 
mysql/tomcat version.

Cheers,
Alex

Original comment by AlextLit...@gmail.com on 10 Oct 2011 at 2:46

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by carlhart...@gmail.com on 10 Oct 2011 at 4:01

GoogleCodeExporter commented 9 years ago
I don't have trouble displaying these on a local instance of RC2 with MySQL 
(I'm seeing somewhat cyrillic text in one post and a username beginning with an 
upward-trending w in a script-like font in the second.

Can you try other browsers (Firefox, Safari, IE) and see if it occurs on only 
some browsers? 

If this does occur on Firefox, can you install the HttpFox add-in, start it, 
and browse to the Submissions page.  

Then look at the .../aggregateui/submissionservice POST request displayed in 
HttpFox, verify that it has a header for Content-Type: text/x-gwt-rpc; 
charset=utf-8
and then verify that the POST Data response type is text/x-gwt-rpc; 
charset=utf-8
and then verify that the Content shows the Amharic characters (or not).  At 
least one of these should be wrong.

Original comment by mitchellsundt@gmail.com on 10 Oct 2011 at 5:53

GoogleCodeExporter commented 9 years ago
Also, with your second update, you said you looked inside the MySQL database 
table and saw it was created with UTF-8 character set.  Does that mean you saw 
the Amharic characters in that table, or is it still an issue of the upload not 
inserting the Amharic into the database.

Also please let me know what browser version you're using; I'm using Firefox 
5.0.

Original comment by mitchellsundt@gmail.com on 10 Oct 2011 at 6:17

GoogleCodeExporter commented 9 years ago
Thanks for all your help, think I've got this sorted out now. The reason was 
due to the my.cnf not having the correct settings. I needed to add the 
following to /etc/mysql/my.cnf (under the [mysqld] section):
init_connect='SET collation_connection = utf8_general_ci'
init_connect='SET NAMES utf8'
default-character-set=utf8
character-set-server = utf8
collation-server = utf8_general_ci 

and under [mysql]: 
default-character-set=utf8 

After I'd added these settings and restarted mysql the submissions are now 
displaying the Amharic and Cyrillic characters fine.

For info I'm running this on the default install of mysql on Ubuntu 10.10 so 
not quite sure why mysql on this OS doesn't have utf8 as the default. Anyway 
thanks for the help and comments - hope this message will help anyone else 
having similar issues.
The issue isn't an ODK Aggregate or Collect error, so can be closed now.
Cheers,
Alex

Original comment by a...@alexlittle.net on 10 Oct 2011 at 10:11

GoogleCodeExporter commented 9 years ago
Should document this as a potential issue on Linux deployments.

Original comment by mitchellsundt@gmail.com on 19 Oct 2011 at 10:20

GoogleCodeExporter commented 9 years ago

Original comment by mitchellsundt@gmail.com on 19 Oct 2011 at 10:31

GoogleCodeExporter commented 9 years ago
Updated Tomcat Install documentation to include mention of setting the default 
character set and collation of the database for UTF-8.

Original comment by mitchellsundt@gmail.com on 19 Apr 2012 at 10:55