openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
855 stars 211 forks source link

vqe.execSelect() collapses when results.hasNext() meets emoji or invalid input. #543

Open Seondong opened 8 years ago

Seondong commented 8 years ago

This issue is about

Exception in thread "main" com.hp.hpl.jena.shared.JenaException: Convert results are FAILED.:virtuoso.jdbc4.VirtuosoException: Virtuoso Communications Link Failure (timeout) : malformed input around byte 34

and sharing same struggle with this post, and this post but no one gives the clear solution on this problem.


Here is my problem. I'm using SPARQL query to extract instances which is valid. But using this query, I can also get instances which name contains emoticon(e.g. http://ko.dbpedia.org/resource/๐Ÿ˜ผ), and it gives me an error while iterating over the query resultsets. How can I escape from emojis?

Query

SELECT DISTINCT ?s WHERE {
?s ?p ?o 
FILTER regex(str(?s), "^http://ko.dbpedia.org/resource")
}
ORDER BY DESC(?s)
limit 100

Java code (Error in line 92 here -> while(results.hasNext()){

Query sparql = QueryFactory.create("SELECT DISTINCT ?s FROM <" + TEST_INPUT_IRI
                + "> WHERE { ?s ?p ?o FILTER regex(str(?s), \"http://ko.dbpedia.org/resource/\")"
                + "FILTER (!regex(str(?s), \"http://ko.dbpedia.org/resource/๋ถ„๋ฅ˜\"))}"
                + "ORDER by ASC(?s)");

        VirtuosoQueryExecution vqe = VirtuosoQueryExecutionFactory.create(sparql, set);
        ResultSet results = vqe.execSelect();

        int i = 0;
        while(results.hasNext()){               // <------------ LoadTriple.java:92  here.
            i=i+1; 
            try{
                QuerySolution result = results.nextSolution();
                RDFNode s1 = result.get("s");
                String subject = s1.toString();
                System.out.println(subject + "----" + i);
                propertylist.add(s1);
            }catch(com.hp.hpl.jena.shared.JenaException e){

            }
        }

Console (Error message):

http://ko.dbpedia.org/resource/์†Œํฅ์ฃผ----293967
http://ko.dbpedia.org/resource/์†Œํฌ----293968
http://ko.dbpedia.org/resource/์†Œํฌ_(๊ฐ€์ˆ˜)----293969
http://ko.dbpedia.org/resource/์†Œํžˆ----293970
Exception in thread "main" com.hp.hpl.jena.shared.JenaException: Convert results are FAILED.:virtuoso.jdbc4.VirtuosoException: Virtuoso Communications Link Failure (timeout) : malformed input around byte 34
    at virtuoso.jena.driver.VirtuosoQueryExecution$VResultSet.moveForward(VirtuosoQueryExecution.java:498)
    at virtuoso.jena.driver.VirtuosoQueryExecution$VResultSet.hasNext(VirtuosoQueryExecution.java:441)
    at kr.ac.kaist.dm.BBox.TypeInference.LoadTriple.processTriples(LoadTriple.java:92)
    at kr.ac.kaist.dm.BBox.TypeInference.TypeInferenceMain.main(TypeInferenceMain.java:110)

My Virtuoso Endpoint gives me a result including those invalid instances, 2016-03-17 22 49 07, however the code collapses when the iterator meets invalid resources (my guess).


I read this previous issue and he also struggled with the similar problem. It gives me a hint, but I couldn't solve mine yet.

HughWilliams commented 8 years ago

@Seondong: What is the version of the Virtuoso Server , Virtuoso Jena Provider and Virtuoso JDBC Driver being used, which can be obtained by running the commands:

virtuoso-t -?
java -jar virt_jena3.jar
java -jar virtjdbc3.jar

Also, when making the Connection with the Virtuoso Jena Provider are you specifying the charset=UTF-8 option in the connect string such that special chars like the emoticon etc can be correct handles as strings ? See:

http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtJenaProvider#Compiling%20Jena%20Sample%20Programs

Seondong commented 8 years ago

Currently, I setup virtuoso 7.2.2 version on my linux server, and run java program from my Windows-PC. In pom.xml, I put dependencies of jena-core 2.12, jena-arq 2.12 so I added virt_jena 2.jar and virtjdbc4.jar as a library.

Virtuoso Version: 7.2.2

sundong@dmserver6:~/virtuoso-opensource-7.2.2/bin$ ./virtuoso-t -?
Virtuoso Open Source Edition (Column Store) (multi threaded)
Version 7.2.2.3215-pthreads as of Jan 28 2016
Compiled for Linux (x86_64-unknown-linux-gnu)

In which directory should I run java -jar virt_jena3.jar command? In my linux server, it gives me an error Error: Unable to access jarfile virt_jena3.jar

To make a connection, I am using the following code.

VirtGraph set = new VirtGraph(INPUT_IRI, HOST, USERNAME, PASSWORD);
HOST = "jdbc:virtuoso://xxx.xxx.xxx.xxx:1111/charset=UTF-8/log_enable=2";
HughWilliams commented 8 years ago

@Seondong: You said you put the jar files in the pom.xml dependancies location on the Windows PC, so you run the java -jar ... command from there ...