ruby-rdf / sparql

Ruby SPARQL library
http://rubygems.org/gems/sparql
The Unlicense
89 stars 14 forks source link

BIND disappers when it used with VALUES #40

Closed manabuishii closed 2 years ago

manabuishii commented 2 years ago

Hello

I use rdf/sparql 3.2.0

I want to use to_sparql .

The problem is when I use to_sparql, BIND is diappeared, when it is used with VALUES.

When I test some code in sparql source code document. BIND works fine .

For example this query

https://togodx.integbio.jp/sparqlist/gene_biotype_ensembl

PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX taxon: <http://identifiers.org/taxonomy/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX faldo: <http://biohackathon.org/resource/faldo#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT DISTINCT ?parent ?child ?child_label
FROM <http://rdf.integbio.jp/dataset/togosite/ensembl>
WHERE {
  ?enst obo:SO_transcribed_from ?ensg .
  ?ensg a ?parent ;
        obo:RO_0002162 taxon:9606 ;
        faldo:location ?ensg_location ;
        dc:identifier ?child ;
        rdfs:label ?child_label .
  FILTER(CONTAINS(STR(?parent), "terms/ensembl/"))
  BIND(STRBEFORE(STRAFTER(STR(?ensg_location), "GRCh38/"), ":") AS ?chromosome)
  VALUES ?chromosome {
      "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
      "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22"
      "X" "Y" "MT"
  }
}

I just parsed above SPARQL and use to_sparql The result is here BIND is disappear.

PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX taxon: <http://identifiers.org/taxonomy/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX faldo: <http://biohackathon.org/resource/faldo#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
FROM <http://rdf.integbio.jp/dataset/togosite/ensembl>
SELECT DISTINCT ?parent ?child ?child_label
WHERE {
?enst obo:SO_transcribed_from ?ensg .
?ensg a ?parent .
?ensg obo:RO_0002162 taxon:9606 .
?ensg faldo:location ?ensg_location .
?ensg dc:identifier ?child .
?ensg rdfs:label ?child_label .
VALUES (?chromosome) {
("1")
("2")
("3")
("4")
("5")
("6")
("7")
("8")
("9")
("10")
("11")
("12")
("13")
("14")
("15")
("16")
("17")
("18")
("19")
("20")
("21")
("22")
("X")
("Y")
("MT")
}

FILTER contains(str(?parent), "terms/ensembl/") .
}

I tested original query without VALUES SPARQL query is following.

PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX taxon: <http://identifiers.org/taxonomy/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX faldo: <http://biohackathon.org/resource/faldo#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT DISTINCT ?parent ?child ?child_label
FROM <http://rdf.integbio.jp/dataset/togosite/ensembl>
WHERE {
  ?enst obo:SO_transcribed_from ?ensg .
  ?ensg a ?parent ;
        obo:RO_0002162 taxon:9606 ;
        faldo:location ?ensg_location ;
        dc:identifier ?child ;
        rdfs:label ?child_label .
  FILTER(CONTAINS(STR(?parent), "terms/ensembl/"))
  BIND(STRBEFORE(STRAFTER(STR(?ensg_location), "GRCh38/"), ":") AS ?chromosome)
}

And parsed result is here. BIND is not disappear.

PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX taxon: <http://identifiers.org/taxonomy/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX faldo: <http://biohackathon.org/resource/faldo#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
FROM <http://rdf.integbio.jp/dataset/togosite/ensembl>
SELECT DISTINCT ?parent ?child ?child_label
WHERE {
?enst obo:SO_transcribed_from ?ensg .
?ensg a ?parent .
?ensg obo:RO_0002162 taxon:9606 .
?ensg faldo:location ?ensg_location .
?ensg dc:identifier ?child .
?ensg rdfs:label ?child_label .
BIND (STRBEFORE(STRAFTER(str(?ensg_location), "GRCh38/"), ":") AS ?chromosome) .
FILTER contains(str(?parent), "terms/ensembl/") .
}

BIND disappear Source Code is here

require "sparql"

# SPARQL
endpoint = "https://integbio.jp/togosite/sparql"
rq = <<'SPARQL'.chop
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX taxon: <http://identifiers.org/taxonomy/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX faldo: <http://biohackathon.org/resource/faldo#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT DISTINCT ?parent ?child ?child_label
FROM <http://rdf.integbio.jp/dataset/togosite/ensembl>
WHERE {
  ?enst obo:SO_transcribed_from ?ensg .
  ?ensg a ?parent ;
        obo:RO_0002162 taxon:9606 ;
        faldo:location ?ensg_location ;
        dc:identifier ?child ;
        rdfs:label ?child_label .
  FILTER(CONTAINS(STR(?parent), "terms/ensembl/"))
  BIND(STRBEFORE(STRAFTER(STR(?ensg_location), "GRCh38/"), ":") AS ?chromosome)
  VALUES ?chromosome {
      "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
      "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22"
      "X" "Y" "MT"
  }
}
SPARQL

puts "original SPARQL query has VALUES:\n#{rq}"

# # convert
parsedobject = SPARQL.parse(rq)
rqfromparsedobject = parsedobject.to_sparql()

puts "BIND disappear SPARQL converted from parsedobject:\n#{rqfromparsedobject}"

Remove VALUES, BIND is not disappear source code is

require "sparql"

# SPARQL
endpoint = "https://integbio.jp/togosite/sparql"
rq = <<'SPARQL'.chop
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX taxon: <http://identifiers.org/taxonomy/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX faldo: <http://biohackathon.org/resource/faldo#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT DISTINCT ?parent ?child ?child_label
FROM <http://rdf.integbio.jp/dataset/togosite/ensembl>
WHERE {
  ?enst obo:SO_transcribed_from ?ensg .
  ?ensg a ?parent ;
        obo:RO_0002162 taxon:9606 ;
        faldo:location ?ensg_location ;
        dc:identifier ?child ;
        rdfs:label ?child_label .
  FILTER(CONTAINS(STR(?parent), "terms/ensembl/"))
  BIND(STRBEFORE(STRAFTER(STR(?ensg_location), "GRCh38/"), ":") AS ?chromosome)
  VALUES ?chromosome {
      "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
      "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22"
      "X" "Y" "MT"
  }
}
SPARQL

puts "original SPARQL query has VALUES:\n#{rq}"

# # convert
parsedobject = SPARQL.parse(rq)
rqfromparsedobject = parsedobject.to_sparql()

puts "BIND is not disappear SPARQL converted from parsedobject:\n#{rqfromparsedobject}"
gkellogg commented 2 years ago

I'll check it out, the to_sparql feature is pretty experimental, and there are a number of failure modes. I should be able to sort this out and get a fix on the develop branch shortly, however.

Thanks for your patience.

gkellogg commented 2 years ago

It's getting swallowed by the implicit JOIN, and some refactoring is required. Your query is parsed as follows:

(prefix ((obo: <http://purl.obolibrary.org/obo/>)
         (taxon: <http://identifiers.org/taxonomy/>)
         (rdfs: <http://www.w3.org/2000/01/rdf-schema#>)
         (faldo: <http://biohackathon.org/resource/faldo#>)
         (dc: <http://purl.org/dc/elements/1.1/>))
 (dataset (<http://rdf.integbio.jp/dataset/togosite/ensembl>)
  (distinct
   (project (?parent ?child ?child_label)
    (filter (contains (str ?parent) "terms/ensembl/")
     (join
      (extend
       ((?chromosome (strbefore (strafter (str ?ensg_location) "GRCh38/") ":")))
       (bgp
        (triple ?enst obo:SO_transcribed_from ?ensg)
        (triple ?ensg a ?parent)
        (triple ?ensg obo:RO_0002162 taxon:9606)
        (triple ?ensg faldo:location ?ensg_location)
        (triple ?ensg dc:identifier ?child)
        (triple ?ensg rdfs:label ?child_label)) )
      (table
       (vars ?chromosome)
       (row (?chromosome "1"))
       (row (?chromosome "2"))
       (row (?chromosome "3"))
       (row (?chromosome "4"))
       (row (?chromosome "5"))
       (row (?chromosome "6"))
       (row (?chromosome "7"))
       (row (?chromosome "8"))
       (row (?chromosome "9"))
       (row (?chromosome "10"))
       (row (?chromosome "11"))
       (row (?chromosome "12"))
       (row (?chromosome "13"))
       (row (?chromosome "14"))
       (row (?chromosome "15"))
       (row (?chromosome "16"))
       (row (?chromosome "17"))
       (row (?chromosome "18"))
       (row (?chromosome "19"))
       (row (?chromosome "20"))
       (row (?chromosome "21"))
       (row (?chromosome "22"))
       (row (?chromosome "X"))
       (row (?chromosome "Y"))
       (row (?chromosome "MT")))))))))

The existing code to output the BIND statements happens when the SELECT/WHERE is emitted, and it needs to move to the end of there internal JOIN clause.

gkellogg commented 2 years ago

@manabuishii This case should work now, and many other have been fixed. At this point, all of the SPARQL 1.0 functionality should work properly. There remain issues with GROUP/HAVING, Property Paths and many other 1.1 expressions, but it's coming along.

manabuishii commented 2 years ago

@gkellogg Thanks ‼️ I test and works fine.

I'm looking forward to it.

gkellogg commented 2 years ago

Great, I'm finishing up some corner-cases, and will push a release fairly soon; in the mean time, it's available from the develop branch.