loklak / loklak_server

Distributed Open Source twitter and social media message search server that anonymously collects, shares, dumps and indexes data http://api.loklak.org
GNU Lesser General Public License v2.1
1.38k stars 223 forks source link

Git clone on windows Ant fails to build. "encoding Cp1252" #785

Closed smokingwheels closed 8 years ago

smokingwheels commented 8 years ago

Cloned a copy 4 aug 2016 Windows as per wiki ant fails on Webo. Will try again to download zip. Confirmed same error with zip file.

Used to work fine I wrote this Also yacy is a service now.

Command prompt output. init:

build: [delete] Deleting directory f:\loklak_server\classes [mkdir] Created dir: f:\loklak_server\classes [echo] loklak: f:\loklak_server\build.xml [javac] Compiling 196 source files to f:\loklak_server\classes [javac] f:\loklak_server\src\org\loklak\api\search\WeiboUserInfo.java:88: er ror: unmappable character for encoding Cp1252 [javac] case "µÇºÕ?ûÕ?æ´?Ü": [javac] ^ [javac] f:\loklak_server\src\org\loklak\api\search\WeiboUserInfo.java:88: er ror: unmappable character for encoding Cp1252 [javac] case "µÇºÕ?ûÕ?æ´?Ü": [javac] ^ [javac] 2 errors

BUILD FAILED f:\loklak_server\build.xml:99: Compile failed; see the compiler error output for details.

Total time: 3 seconds

f:\loklak_server>

xml output file <?xml version="1.0" encoding="UTF-8" standalone="no"?>

``` ```
sudheesh001 commented 8 years ago

@smokingwheels This looks more like an issue with the character encoding that's there, the file contains chinese characters which isn't being recognized in the default charset, can you specify the javac encoding to compile with charset of UTF-8 than in ASCII, That should hopefully fix the problem or you need to have the charset installed. Had no problems with installing it on windows with the charset

smokingwheels commented 8 years ago

See dos2unix.sourceforge.net I ran this tool "unix2dos.exe" with no options in the loklak_server folder compiles ok. I have used ant on Yacy about a month ago, had no problems.

sudheesh001 commented 8 years ago

@smokingwheels I would need more time to go through the unix2dos tool that might be causing these problems, but without that, if you run ant from the CLI, Does it perform fine ?

smokingwheels commented 8 years ago

@sudheesh001 What is CLI ?

jigyasa-grover commented 8 years ago

@smokingwheels CLI is Command Line Interface or the Terminal

smokingwheels commented 8 years ago

@jig08 Maybe I should re phrase the error. Many Thanks.

jigyasa-grover commented 8 years ago

@smokingwheels Yes, it might help.

sudheesh001 commented 8 years ago

So the encoding issue seems more to be fixable with the file.encoding. Can you try setting JAVA_OPTS='-Xmx384m -Xss512k -XX:+UseCompressedOops -Dfile.encoding=UTF-8' or JAVA_TOOL_OPTIONS='-Dfile.encoding=UTF-8'

smokingwheels commented 8 years ago

@sudheesh001 Ok so I will Delete and Clone again and try to reproduce Error, then add changes as you suggest.

sudheesh001 commented 8 years ago

Any luck ?

smokingwheels commented 8 years ago

@sudheesh001 Its something about Uppercase Strings in java. I added a system setting to system properties like you suggested and it compiled. See [javac] f:\loklak_server\src\org\loklak\api\search\WeiboUserInfo.java:88: er ror: unmappable character for encoding Cp1252 [javac] case "µÇºÕ?ûÕ?æ´?Ü":

f:\loklak_server\build.xml:99: Compile failed; see the compiler error output for details.

This could be the problem non text char in "WeiboUserInfo.java" if (info.getElementsByAttributeValueContaining("href", "loc=infblog").size()==0) { profile=info.getElementsByAttributeValue("class","pt_detail").first().text().trim(); obj.put("pro", profile); switch(info.getElementsByAttributeValue("class", "pt_title S_txt2").first().text()){ case "Nicknameï¼?": obj.put("username", profile); break; case "Locationï¼?": obj.put("Address", profile); break; case "Genderï¼?": obj.put("Gender", profile); break; case "æ?§å?å?ï¼?": obj.put("Sexuality", profile.replace("t", "").replace("rn", "")); break; case "æ??æ??ç?¶å?µï¼?": obj.put("Relationship", profile.replace("t", "").replace("rn", "")); break; case "Birthdayï¼?": obj.put("Birthday", profile); break; case "è¡?å??ï¼?": obj.put("Blood", profile); break; case "Domain Nameï¼?": if(info.getElementsByAttributeValueContaining("href", "loc=infdomain").size()!=0) profile=info.select("a").text(); obj.put("Personaldomain", profile); break; case "ç®?ä»?ï¼?": obj.put("Profile", profile); break; case "Registrationï¼?": obj.put("Registertime", profile.replace("t", "").replace("rn", "")); break; case "Emailï¼?": obj.put("Email", profile); break; case "QQï¼?": obj.put("Qq", profile); break; case "大学ï¼?": obj.put("College", profile.replace("t", "").replace("rn", "")); break; case "Tagsï¼?": obj.put("Tag", profile.replace("t", "").replace("rn", "")); break; }

            }else {
                String blogurl=info.select("a").text();
                obj.put("Blog", blogurl);
smokingwheels commented 8 years ago

Updated Wiki 32 Bit but Loklak wont start.

sudheesh001 commented 8 years ago

Reopening, fails to start the loklak server on a fresh install.

sudheesh001 commented 8 years ago

It still seems like adding -Dfile.encoding=UTF-8 seems to fix it, Also we really need to give those Weibo scrapers another rewrite.