openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
863 stars 210 forks source link

Virtuoso 37000 Error SP031: SPARQL: Internal error: The length of generated SQL text has exceeded 10000 lines of code #664

Open mazzogp opened 7 years ago

mazzogp commented 7 years ago

The following error --

Virtuoso 37000 Error SP031: SPARQL: Internal error: The length of generated SQL text has exceeded 10000 lines of code

-- is issued when I execute the SPARQL statement below on a virtual graph in Virtuoso Open Source. The graph has been obtained with the R2RML mapping.

prefix pest:<http://example.com/resource/> 
select * from <http://example.com/resource>
where {
  ?mainProdId    pest:productCode    "0120010" .
  ?prodSynonymId pest:productCode    ?productCode .
  ?prodSynonymId pest:hasProduct     ?mainProdId .
  ?prodSynonymId pest:languageCode   'EN' .
  ?prodSynonymId pest:scientificName ?otherProdScientificName.
  ?prodSynonymId pest:synonymName    ?otherProductName.
  ?prodSynonymId pest:codeNumber     ?otherProductCode .
}

The following are the needed operations to reproduce the issue starting from the scratch. It happens as well leaving the DB tables empty.

  1. download the Virtuoso OS source code with git clone, compile it and install it. I got the version 07.20.3217
  2. modify the virtuoso.ini file in the parameter DefaultHost = ec.europa.eu
  3. install the packages cartridges_dav.vad, rdb2rdf_dav.vad from the isql prompt
  4. add to the user SPARQL the SPARQL_SELECT role and the dba primary role
  5. create a simple virtual graph

    1. download the script create_tables_bug.txt
    2. rename it from .txt to .sql and run it from the isql prompt.
    3. copy the R2RML script R2RML_bug.txt in your_directory
    4. rename the downloaded file from .txt to .sql
    5. finally run the following commands from the isql prompt.

      SPARQL CLEAR GRAPH http://trush.me/1; DB.DBA.TTLP (file_to_string_output ('your_directory/R2RML_bug.sql'), 'http://trush.me/1', 'http://trush.me/1'); EXEC ('SPARQL ' || DB.DBA.R2RML_MAKE_QM_FROM_G ('http://trush.me/1'));

This should be enough to run the above SPARQL query

HughWilliams commented 7 years ago

When I load your SQL and R2RML triple mappings script the query runs and returns no results ...

How exactly do you load your R2RML mapping as I load via the Conductor R2RML tab ?

mazzogp commented 7 years ago

I used the following script run from isql prompt.

SPARQL CLEAR GRAPH <http://trush.me/1>;
DB.DBA.TTLP
  (file_to_string_output ('C:/LocalPrograms/dev/virtuoso/scripts/issues/SPARQL-error-generated-SQL-too-long/R2RML_bug.sql'),
  'http://trush.me/1',
  'http://trush.me/1');
EXEC ('SPARQL ' || DB.DBA.R2RML_MAKE_QM_FROM_G ('http://trush.me/1'));

Then I executed the SPARQL query from the Conductor Interactive SQL

IvanMikhailov commented 7 years ago

I can't reproduce the error with latest builds of Virtuoso Universal Server. I'd suspect multiple loads of same .TTL file to same graph without CLEAR GRAPH in between but you say your script contains the proper CLEAR GRAPH. Could you please post the outputs of

set blobs on;
select DB.DBA.R2RML_MAKE_QM_FROM_G ('http://trush.me/1') as x long varchar;

and

EXEC ('SPARQL ' || DB.DBA.R2RML_MAKE_QM_FROM_G ('http://trush.me/1'));

?

mazzogp commented 7 years ago

@IvanMikhailov please find below the output of the requested commands execution.

SQL> set blobs on;
SQL> select DB.DBA.R2RML_MAKE_QM_FROM_G ('http://trush.me/1') as x long varchar;
x
LONG VARCHAR
_______________________________________________________________________________

prefix virtrdf: <http://www.openlinksw.com/schemas/virtrdf#>
prefix ns1: <http://trush.me/>
prefix ns2: <http://example.com/>
create IRI class <r2rml:virt02-0197c7aa15f8d06dbc1788c1bf2cda0c> "http://example.com/resource/productSynonym-%d" (in ID integer) .
create IRI class <r2rml:virt02-f22e577ccc9221d9f09d60311320ee87> "http://example.com/resource/product-%U" (in PRODUCT_CODE varchar) .
create IRI class <r2rml:virt02-9b2d6a967cef75faba6c0da8557f6da0> "http://example.com/resource/product-%U" (in PRODUCT_PARENT_CODE varchar) .
alter quad storage virtrdf:DefaultQuadStorage
from "DB"."TEST"."V_SEM_PRODUCTS1" as tbl1
from "DB"."TEST"."V_SEM_PRODUCT_SYN_MOD1" as tbl0
  {
    create ns1:1 as graph ns2:resource option (soft exclusive) {
            <r2rml:virt02-0197c7aa15f8d06dbc1788c1bf2cda0c> (tbl0."ID")
                    <http://example.com/resource/hasProduct> <r2rml:virt02-f22e577ccc9221d9f09d60311320ee87> (tbl0."PRODUCT_CODE") ;
                    <http://example.com/resource/productCode> tbl0."PRODUCT_CODE" ;
                    <http://example.com/resource/codeNumber> tbl0."CODE_NUMBER" ;
                    <http://example.com/resource/synonymId> tbl0."SYNONYM_ID" ;
                    <http://example.com/resource/productSynonymId> tbl0."ID" ;
                    <http://example.com/resource/synonymName> tbl0."SYNONYM_NAME" ;
                    <http://example.com/resource/languageCode> tbl0."LANGUAGE_CD" ;
                    <http://example.com/resource/scientificName> tbl0."SCIENTIFIC_NAME" ;
                    a <http://example.com/resource/ProductSynonym> .
            <r2rml:virt02-f22e577ccc9221d9f09d60311320ee87> (tbl1."PRODUCT_CODE")
                    <http://example.com/resource/productCode> tbl1."PRODUCT_CODE" ;
                    <http://example.com/resource/hasParentProduct> <r2rml:virt02-9b2d6a967cef75faba6c0da8557f6da0> (tbl1."PRODUCT_PARENT_CODE") ;
                    <http://example.com/resource/productA3B> tbl1."A3B" ;
                    a <http://example.com/resource/Product> }
  }
1 Rows. -- 46 msec.
SQL>
SQL>
SQL>
SQL> EXEC ('SPARQL ' || DB.DBA.R2RML_MAKE_QM_FROM_G ('http://trush.me/1'));
STATE    MESSAGE
VARCHAR  VARCHAR
_______________________________________________________________________________

00000    Previous definition of IRI class <r2rml:virt02-0197c7aa15f8d06dbc1788c1bf2cda0c> is identical to the new one, not touched
00000    Previous definition of IRI class <r2rml:virt02-f22e577ccc9221d9f09d60311320ee87> is identical to the new one, not touched
00000    Previous definition of IRI class <r2rml:virt02-9b2d6a967cef75faba6c0da8557f6da0> is identical to the new one, not touched
00000    Quad storage <http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage> is flagged as being edited
00000    Quad map <http://trush.me/1> has been created and added to the <http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage>
00000    Quad map <sys:qm-3413673d1d69365d66031f3318efc34c> has been created and added to the <http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage>
00000    Quad map <sys:qm-04eeb09bf297a28c12abc3bd40bdcdf6> has been created and added to the <http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage>
00000    Quad map <sys:qm-d16617e2fffcdc54c69937a4da5c2353> has been created and added to the <http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage>
00000    Quad map <sys:qm-597c9291196137b00b52bebf801a864b> has been created and added to the <http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage>
00000    Quad map <sys:qm-90917b03820c8bed8f5cdf69ae3c8dd2> has been created and added to the <http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage>
00000    Quad map <sys:qm-f12df7687ca0d87f54623123849f6548> has been created and added to the <http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage>
00000    Quad map <sys:qm-c4f410092476861b2e2a16275f289e67> has been created and added to the <http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage>
00000    Quad map <sys:qm-0c63e1f0376f1d8d0973efd46ec601a2> has been created and added to the <http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage>
00000    Quad map <sys:qm-1edeacdd7e640689da496dd984c8467c> has been created and added to the <http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage>
00000    Quad map <sys:qm-9d510ba4b53e716b0f9bc24059be563d> has been created and added to the <http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage>
00000    Quad map <sys:qm-2056253d11d7249952bca2685588b443> has been created and added to the <http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage>
00000    Quad map <sys:qm-28467735582285908c5a19fad5f4abd7> has been created and added to the <http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage>
00000    Quad map <sys:qm-83ff3034e16d45f3e2b15933a8bf9857> has been created and added to the <http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage>
00000    Quad storage <http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage> is unflagged and can be edited by other transactions
00000    Transaction committed, SPARQL compiler re-configured
00000    20 RDF metadata manipulation operations done

21 Rows. -- 733 msec.

I am obliged to delete the .ttl graph <http://trush.me/1> every time I run the script generating the virtual graph <http://example.com/resource/>. If I do not delete it I get the following error

SQL> load ./ttlpExecTrial.sql;
Done. -- 0 msec.
*** Error 37000: [OpenLink][Virtuoso ODBC Driver][Virtuoso Server]SQ074: Line 63: SP031: SPARQL compiler: The statement contains two identical declarations of mappings
at line 16 of load ./ttlpExecTrial.sql:
EXEC ('SPARQL ' || DB.DBA.R2RML_MAKE_QM_FROM_G ('http://trush.me/1'))

Please find attached as well my virtuoso.inifile.

virtuoso.txt

mazzogp commented 7 years ago

@IvanMikhailov Hi Ivan. Currently I have an environment in which the error occurs and another in which it stopped occurring (with apparent no reason, always on the same example above). Comparing the result of the requested command

set blobs on;
select DB.DBA.R2RML_MAKE_QM_FROM_G ('http://trush.me/1') as x long varchar;

I notice that the working environment has one difference that seems meaningful. It is in the following line

create ns1:1 as graph ns2:resource option (exclusive) {" 

in comparison with the response of the not working environment,

create ns1:1 as graph ns2:resource option (soft exclusive) {

See the attachment below for the full command result of the working environment. command responses-working.txt This made me remember what you answered to my e-mail in Virtuoso users mailing list related to the same problem

Altermatively, the trouble may happen if there is only one RDF View, but its top-level quad map is not exclusive. In that case every triple pattern is a union of data from virtual graph and data from physical triples, the overall result is again a union of 1024 joins.

If this is the case why in one environment it is exclusive and in the other one is not? In both environments the same simple example has been installed.

mazzogp commented 7 years ago

@IvanMikhailov Hi Ivan, I hope you do not feel spammed by me (:-)

I have now obtained the generated SQL of the SPARQL query above from the working environment and from the not working environment. The difference is astonishing!

The working one has 5 INNER JOIN, the not working one 415. Please find them attached below.

The environment currently not showing the problem has the following version

Virtuoso Open Source Edition (Column Store) (multi threaded)
Version 7.2.4.2.3217-pthreads as of Feb  3 2017
Compiled for Linux (x86_64-unknown-linux-gnu)
Copyright (C) 1998-2016 OpenLink Software
TallTed commented 5 years ago

Hi, @beppem --

I'm wondering whether this issue resolved for you?

Assuming you're still having this issue with some build(s) of 7.2.4, it would be helpful to know what your experience is with a current build of VOS 7.2.5.1 (stable/7) (prebuilt binary is an option here -- you can replace just the virtuoso-t in your docker image), VOS 7.2.6-rc1 (develop/7), and/or Enterprise Edition 8.2 (current).

If you still have one environment showing the problem, and one not, it would be helpful to know the full version info (as shown in your last update here) for both of these instances. It would also be helpful to get output of this SPARQL query discussed here.

mazzogp commented 5 years ago

Hi Ted

No, the issue has never been fixed.

What we did not to remain stuck was to switch to physical graphs. With them all the queries worked fine and we never experienced any longer this (and other) strange behaviours.

Cheers

Beppe