purepennons / gss

Automatically exported from code.google.com/p/gss
Other
0 stars 0 forks source link

croatian letters "ščđćž" dont work with folders #58

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. create new folder named "ščćđž" 
2. refresh - folder is not shown
3. create new folder named "ščćđž" - You dont have necessary permissions 
or a folder with the same name already exists

What is the expected output? What do you see instead?
I expect to see folder with croatian letters, since files with croatian letter 
seem to work. I dont see any folder with croatian letters, but gss claims there 
is directory already existing.

What version of the product are you using? On what operating system?
I am using latest gss source, on centos 5 64bit.

Please provide any additional information below.
I even tried changing requestAttributeEncoding=UTF-8 in gss.properties, but I 
use already existing gssdb database. I dont have to update it manually 
somewhere in the database?

Regards,
Nikola

Original issue reported on code.google.com by ngara...@gmail.com on 17 Nov 2010 at 2:18

GoogleCodeExporter commented 8 years ago
Do I have to change everything to utf-8, including character encoding in gss 
source? And follow this guide for jboss?

Set URIEncoding="UTF-8" on your <Connector> in server.xml. References: HTTP 
Connector, AJP Connector.
Use a character encoding filter with the default encoding set to UTF-8
Change all your JSPs to include charset name in their contentType.
For example, use <%@page contentType="text/html; charset=UTF-8" %> for the 
usual JSP pages and <jsp:directive.page contentType="text/html; charset=UTF-8" 
/> for the pages in XML syntax (aka JSP Documents).
Change all your servlets to set the content type for responses and to include 
charset name in the content type to be UTF-8.
Use response.setContentType("text/html; charset=UTF-8") or 
response.setCharacterEncoding("UTF-8").
Change any content-generation libraries you use (Velocity, Freemarker, etc.) to 
use UTF-8 and to specify UTF-8 in the content type of the responses that they 
generate.
Disable any valves or filters that may read request parameters before your 
character encoding filter or jsp page has a chance to set the encoding to 
UTF-8. For more information see 
http://www.mail-archive.com/users@tomcat.apache.org/msg21117.html.

Regards,
Nikola

Original comment by ngara...@gmail.com on 17 Nov 2010 at 3:01

GoogleCodeExporter commented 8 years ago
Has your <Connector> in server.xml a URIEncoding="UTF-8" argument? If not, you 
should add it and restart jboss. I cannot reproduce the problem in our 
installations (yes, croatian letters work perfect here :-)), so make the change 
and let us know. 

Original comment by chstath on 17 Nov 2010 at 3:26

GoogleCodeExporter commented 8 years ago
Yes, both AJP and HTTP connectors in jboss have already been set to UTF-8 (out 
of the box I think, since I dont remember changing it). So, you can create 
folder with croatian letters? I can create, but am not able to fetch folder 
later on. 
I get 502 bad gateway error when trying to list it.

I dont get how files with croatian letters work without problems on the other 
hand, and folders wont.

Regards,
Nikola

Original comment by ngara...@gmail.com on 17 Nov 2010 at 9:20

GoogleCodeExporter commented 8 years ago
It is probable that the 502 response comes from apache. Try to use one of the 
jboss servers directly to see what error is returned.

Original comment by chstath on 19 Nov 2010 at 2:58

GoogleCodeExporter commented 8 years ago
After closer inspection, here is what I got from haproxy discussion group:

>> echo "show errors" | socat stdio unix-connect:/var/run/haproxy.sock
> > 
> > # echo "show errors" | socat stdio unix-connect:/var/run/haproxy.sock
> > 
> > [19/Nov/2010:15:01:56.646] backend www (#1) : invalid response
> >   src aaa.bbb.ccc.ddd, session #645, frontend www (#1), server
> > backend-srv1 (#1)
> >   response length 857 bytes, error at position 268:
> > 
> >   00000  HTTP/1.1 200 OK\r\n
> >   00017  Date: Fri, 19 Nov 2010 14:01:56 GMT\r\n
> >   00054  Server: Apache/2.2.3 (CentOS)\r\n
> >   00085  X-Powered-By: Servlet 2.5; JBoss-5.0/JBossWeb-2.1\r\n
> >   00136  Expires: -1\r\n
> >   00149  X-GSS-Metadata:
> > {"creationDate":1290002859579,"createdBy":"ngarafol@sr
> >   00219+
> > ce.hr","modifiedBy":"username@domain","name":"a\r\x07\x11~","owner":"
> >   00282+
> > username@domain","modificationDate":1290002859579,"deleted":false}\r
> >   00350+ \n
> >   00351  Content-Length: 418\r\n
> >   00372  Connection: close\r\n
> >   00391  Content-Type: application/json;charset=UTF-8\r\n
> >   00437  \r\n
> >   00439
> > {"files":[],"creationDate":1290002859579,"createdBy":"username@domain
> >   00509+
> > ","modifiedBy":"username@domain","readForAll":false,"name":"\xC5\xA1
> >   00572+
> > \xC4\x8D\xC4\x87\xC4\x91\xC5\xBE","permissions":[{"modifyACL":true,"wr
> >   00618+
> > ite":true,"read":true,"user":"username@domain"}],"owner":"username@domain
> >   00688+ ce.hr","parent":{"name":"User User","uri":"http://server/p
> >   00758+
> > ithos/rest/username@domain/files/"},"folders":[],"modificationDate":1
> >   00828+ 290002859579,"deleted":false}
Excellent, we have it now.

> >   00149  X-GSS-Metadata: 
{"creationDate":1290002859579,"createdBy":"ngarafol@sr
> >   00219+ 
ce.hr","modifiedBy":"username@domain","name":"a\r\x07\x11~","owner":"
> >   00282+ 
username@domain","modificationDate":1290002859579,"deleted":false}\r
> >   00350+ \n
You see above, position 268 ? It's the \x07 just after the \r on the second
line. The issue is not related to UTF-8 at all, those are just forbidden
characters possibly resulting from corrupted memory. The "\r" prefixes an
end of header and may only be followed by a "\n".

From RFC2616:

       message-header = field-name ":" [ field-value ]
       field-name     = token
       field-value    = *( field-content | LWS )
       field-content  = <the OCTETs making up the field-value
                        and consisting of either *TEXT or combinations
                        of token, separators, and quoted-string>

       token          = 1*<any CHAR except CTLs or separators>
       quoted-string  = ( <"> *(qdtext | quoted-pair ) <"> )
       qdtext         = <any TEXT except <">>
       quoted-pair    = "\" CHAR
       TEXT           = <any OCTET except CTLs,
                        but including LWS>
       separators     = "(" | ")" | "<" | ">" | "@"
                      | "," | ";" | ":" | "\" | <">
                      | "/" | "[" | "]" | "?" | "="
                      | "{" | "}" | SP | HT

       CHAR           = <any US-ASCII character (octets 0 - 127)>
       CTL            = <any US-ASCII control character
                        (octets 0 - 31) and DEL (127)>

So as you can see, CTL characters cannot appear anywhere unescaped
(an HTTPBIS spec refines that further by clearly insisting on the
fact that those chars may not even be escaped). So clearly those
0x0D 0x07 0x11 characters at position 268 are forbidden here and
break the parsing of the line.

What I suspect is that the characters were UTF-8 encoded in the
database, but the application server stripped the 8th bit before
putting them on the wire, which resulted in what you have. That's
just a pure guess, of course. Another possibility is that those bytes
represent an integer value that was accidentely outputted with a "%c"
formatting instead of a "%d".

We can't even let that pass with "option accept-invalid-http-response"
because the issue will be even worse for characters that are returned
as 0x0D 0x0A, that will end the line and start a new header with the
remaining data.

The only solution right here is to try to see where it breaks in the
application (maybe it's a memory corruption issue after all) and to
fix it ASAP.

Original comment by ngara...@gmail.com on 20 Nov 2010 at 12:36

GoogleCodeExporter commented 8 years ago
I 'm not sure I understand. Is it a problem with haproxy or with gss? Or 
something between the two?

Original comment by chstath on 23 Nov 2010 at 9:49

GoogleCodeExporter commented 8 years ago
Problem is not with haproxy but with X-GSS-Metadata HTTP header not conforming 
the RFC when folder or file have some UTF-8 characters encoded, in my case 
Croatian letters I've mentioned earlier. X-GSS-Metadata (or any other variable 
in the header) is not allowed to have UTF-8 characters but only ISO-8859-1. 
It's defined in the RFC.

Take a look at the field name in the X-GSS-Metadata:
"name":"a\r\x07\x11~"
and later in the Content-Type:
"name":"\xC5\xA1\xC4\x8D\xC4\x87\xC4\x91\xC5\xBE"

It's obvious that whoever puts X-GSS-Metadata in the headers does it wrong, 
because Content-Type is encoded how it should be. All you have to do is 
eliminate UTF-8 characters or encode them in the way they are encoded in the 
Content-Type.

Do you understand?

Original comment by ngara...@gmail.com on 23 Nov 2010 at 10:48

GoogleCodeExporter commented 8 years ago
ΟΚ. I found the problem in the code. I have to check first if the fix breaks 
anything in the other clients (e.g. the desktop client) and if not then I 'll 
do the patch. I believe that tomorrow we 'll have something to test.

Original comment by chstath on 23 Nov 2010 at 12:43

GoogleCodeExporter commented 8 years ago

Original comment by chstath on 29 Nov 2010 at 10:00

GoogleCodeExporter commented 8 years ago
I made a fix about this. It is in both the default branch and solr1.4 branch. 
Check if it is ok and let me know so that I can close the issue

Original comment by chstath on 29 Nov 2010 at 3:41

GoogleCodeExporter commented 8 years ago
Tested updated source, seems to work. No errors visible. Everything looks ok.
Thanks for fixing it.

Regards,
Nikola

Original comment by ngara...@gmail.com on 30 Nov 2010 at 11:10

GoogleCodeExporter commented 8 years ago

Original comment by chstath on 30 Nov 2010 at 11:19