Open davidshumway opened 2 years ago
Assuming distances are precomputed between every building and weather station, one attempt might look like:
# For every observation from one building, returns observation along with up to 15 associated
# weather observations.
SELECT ?building1 ?buildingValue ?day1 ?minDist ?weatherProperty1 ?weatherValue WHERE {
?weatherObs a ex:weatherObservation ;
ex:hasProperty ?weatherProperty2 ;
ex:hasSimpleResult ?weatherValue ;
ex:obsDay ?day2 ;
ex:hasDistanceToBuilding [
ex:distance ?dist2 ;
ex:building ?building2 ].
FILTER(?dist2 = ?minDist) .
FILTER(?day2 = ?day1) .
FILTER(?building2 = ?building1) .
FILTER(?weatherProperty2 = ?weatherProperty1) .
# Selects minimum distance (WS to building) on given day for every
# distinct weatherProperty.
{SELECT MIN(?dist) AS ?minDist ?day1 ?building1 ?buildingValue ?weatherProperty1 WHERE {
?weatherObs a ex:weatherObservation ;
ex:obsWeatherProperty ?weatherProperty1 ;
ex:obsDay ?day2 ;
ex:hasDistanceToBuilding [
ex:distance ?dist ;
ex:building ?building2 ] .
FILTER(?building2 = ?building1) .
FILTER(?day2 = ?day1) .
# Selects every daily observation for every building.
{SELECT ?day1 ?building1 ?buildingValue WHERE {
?building1 a ex:buildingObservation ;
ex:hasSimpleResult ?buildingValue ;
ex:obsDay ?day1 .
}}
} GROUP BY ?day1 ?building1 ?buildingValue ?weatherProperty1 }
}
So the query is taking too long to finish. Profiling shows the following:
156388 msec 67% cpu, 7.44829e+08 rnd 7.31708e+08 seq 97.9652% same seg 1.79578% same pg
11066 disk reads, 30173 read ahead, 0.048558% wait
Compilation: 1741 msec 0 reads 0% read 0 messages 0% clw
Does that look... good? Bad?
In terms of performance is there any advantage to storing the two (large) datasets in separate graphs?
There would generally be no performance advantage in having separate graphs for each dataset.
More importantly, what is the size of your datasets in total triples? Have you Performance Tuned your Virtuoso instance for hosting that many triples in memory, so as not to have to swap to and from disk? Performance will always be degraded if enough memory is not allocated, and if Virtuoso is not configured to make use of that memory.
Some notes:
Version:
OpenLink Virtuoso Interactive SQL (Virtuoso) Version 07.20.3233 as of Jun 22 2021
I setup memory usage for 8GB of free memory. Thus far I haven't run into any queries requiring use of swap.
I didn't try altering the indexing.
The database is stored on a SSD.
Number of triples:
Total: A little under 50 million.
50 million triples should be quite happy in 8 GB RAM, as we generally recommend a minimum of 10 bytes per triple, or 500 MB for your 50 million triples.
That said, your query may have some surprising optimizations awaiting. Note, for instance, that SPARQL queries are processed "inside-out"; basically, the innermost subquery is processed first, and the outermost subquery last. This is sometimes confusingly called "bottom-up" (confusing because it's not about last-line-first, but the deepest-first). It's not easy to see which subquery is at what depth, with your current query layout. This is one reason I tend to put a lot of syntactically meaningless whitespace into my SPARQL -- because it clarifies the depths.
# For every observation from one building, returns observation along with up to 15 associated
# weather observations.
SELECT ?building1
?buildingValue
?day1
?minDist
?weatherProperty1
?weatherValue
WHERE
{ ?weatherObs a ex:weatherObservation ;
ex:hasProperty ?weatherProperty2 ;
ex:hasSimpleResult ?weatherValue ;
ex:obsDay ?day2 ;
ex:hasDistanceToBuilding [ ex:distance ?dist2 ;
ex:building ?building2
].
filter ( ?dist2 = ?minDist ) .
filter ( ?day2 = ?day1 ) .
filter ( ?building2 = ?building1 ) .
filter ( ?weatherProperty2 = ?weatherProperty1 ) .
# Selects minimum distance (WS to building) on given day for every
# distinct weatherProperty.
{ SELECT ( MIN ( ?dist ) AS ?minDist )
?day1
?building1
?buildingValue
?weatherProperty1
WHERE
{ ?weatherObs a ex:weatherObservation ;
ex:obsWeatherProperty ?weatherProperty1 ;
ex:obsDay ?day2 ;
ex:hasDistanceToBuilding [ ex:distance ?dist ;
ex:building ?building2
] .
filter ( ?building2 = ?building1 ) .
filter ( ?day2 = ?day1 ) .
# Selects every daily observation for every building.
{ SELECT ?day1
?building1
?buildingValue
WHERE
{ ?building1 a ex:buildingObservation ;
ex:hasSimpleResult ?buildingValue ;
ex:obsDay ?day1 .
}
}
}
GROUP BY ?day1 ?building1 ?buildingValue ?weatherProperty1
}
}
I wonder whether what you're trying to achieve wouldn't be more efficiently done with inverted subqueries?
Thanks for the feedback! Not sure what you mean with inverted subqueries?
I've tried performing the query starting with the weather observations as the innermost subquery and then processing the building subquery last but didn't see much difference in terms of performance. I'm not sure what kind of optimizations are available at this point, hence my reaching out for advice here.
Thanks for the feedback! Not sure what you mean with inverted subqueries?
I've tried performing the query starting with the weather observations as the innermost subquery and then processing the building subquery last but didn't see much difference in terms of performance. I'm not sure what kind of optimizations are available at this point, hence my reaching out for advice here.
Do you have a live query service endpoint that we could interact with? Do you have the datasets in question in an http-accessible location so that we could upload to one of our instances?
Randomly generated data but should suffice. weatherObs.n3.gz is the only generally large file. The script to generate the files is here: https://gist.github.com/davidshumway/23c098cbd52bdf0a84c8530b3353f647. Note that in this example the locations are not restricted to CA.
buildingObs.n3.gz https://drive.google.com/file/d/1-6nOTCTlIzINa1LyjIu22q-fzWvrnrk7/view?usp=sharing
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix sosa: <http://www.w3.org/ns/sosa/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix ex: <http://www.example3.com/>
prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix geof: <http://www.opengis.net/def/function/geosparql/>
ex:buildingObservation-1835465 a sosa:Observation ;
sosa:hasFeatureOfInterest ex:building-65 ;
sosa:resultTime "2014-1-3T00:00:00"^^xsd:dateTime ;
sosa:hasSimpleResult "9"^^xsd:double ;
ex:observationType "building"^^xsd:string .
ex:buildingObservation-271856 a sosa:Observation ;
sosa:hasFeatureOfInterest ex:building-56 ;
sosa:resultTime "2021-2-21T00:00:00"^^xsd:dateTime ;
sosa:hasSimpleResult "9"^^xsd:double ;
ex:observationType "building"^^xsd:string .
weatherObs.n3.gz https://drive.google.com/file/d/1-Cggjteq_gwzY3QJlsFGpGstzgt5IHjP/view?usp=sharing 248MB compressed, 4.3GB in raw format
# prefixes ...
ex:weatherObservation-14879854 a sosa:Observation ;
sosa:hasFeatureOfInterest ex:weatherStation-154 ;
sosa:resultTime "2013-5-1T00:00:00"^^xsd:dateTime ;
sosa:hasProperty "9"^^xsd:string ;
sosa:hasSimpleResult "6"^^xsd:double ;
ex:observationType "weather"^^xsd:string .
ex:weatherObservation-10873163 a sosa:Observation ;
sosa:hasFeatureOfInterest ex:weatherStation-263 ;
sosa:resultTime "2015-10-8T00:00:00"^^xsd:dateTime ;
sosa:hasProperty "3"^^xsd:string ;
sosa:hasSimpleResult "3"^^xsd:double ;
ex:observationType "weather"^^xsd:string .
# ...
foi.n3.gz https://drive.google.com/file/d/1-J9My5RDjUYMvYancElRwtEtdiImbQ9F/view?usp=sharing
# prefixes ...
ex:weatherStation-0 a sosa:Sensor ;
ex:geoType "weather"^^xsd:string ;
geo:asWKT "<http://www.opengis.net/def/crs/EPSG/0/4326> POINT(81.6799480003782 29.058143278701607)"^^geo:wktLiteral .
ex:building-0 a sosa:Sensor ;
ex:geoType "building"^^xsd:string ;
geo:asWKT "<http://www.opengis.net/def/crs/EPSG/0/4326> POINT(-45.66468600537044 -89.41492295285423)"^^geo:wktLiteral .
# ...
dist.n3.gz https://drive.google.com/file/d/1-JWzlRY8ZfjrawSXoLYPtwMLad2SPLMV/view?usp=sharing
# prefixes ...
ex:weatherStation-0 ex:hasDistanceToBuilding [
ex:distance "17567"^^xsd:decimal ;
ex:building ex:building-0 ] .
ex:building-0 ex:hasDistanceToStation [
ex:distance "17567"^^xsd:decimal ;
ex:weatherStation ex:weatherStation-0 ] .
ex:weatherStation-0 ex:hasDistanceToBuilding [
ex:distance "449"^^xsd:decimal ;
ex:building ex:building-1 ] .
ex:building-1 ex:hasDistanceToStation [
ex:distance "449"^^xsd:decimal ;
ex:weatherStation ex:weatherStation-0 ] .
# ...
And updating the query a bit to match:
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix sosa: <http://www.w3.org/ns/sosa/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix ex: <http://www.example3.com/>
prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix geof: <http://www.opengis.net/def/function/geosparql/>
SELECT ?building ?buildingValue ?buildingFoi ?day
?minDist ?weatherProperty ?weatherValue ?weatherStationFoi
WHERE {
?weatherObs a sosa:Observation ;
sosa:hasFeatureOfInterest ?weatherStationFoi ;
sosa:resultTime ?day ;
sosa:hasProperty ?weatherProperty ;
ex:observationType ?weatherObsType ;
sosa:hasSimpleResult ?weatherValue .
?weatherStationFoi ex:hasDistanceToBuilding [
ex:distance ?minDist ;
ex:building ?buildingFoi ] .
{SELECT MIN(?distance) AS ?minDist ?building ?buildingValue ?buildingFoi
?day ?weatherProperty
WHERE {
?weatherObs a sosa:Observation ;
sosa:hasFeatureOfInterest ?weatherStationFoi ;
sosa:resultTime ?day ;
sosa:hasProperty ?weatherProperty ;
ex:observationType ?weatherObsType .
?weatherStationFoi ex:hasDistanceToBuilding [
ex:distance ?distance ;
ex:building ?buildingFoi ] .
{SELECT ?day ?building ?buildingValue ?buildingFoi where {
?building a sosa:Observation ;
sosa:hasFeatureOfInterest ?buildingFoi ;
sosa:hasSimpleResult ?buildingValue ;
sosa:resultTime ?day ;
ex:observationType ?obsType .
FILTER(?obsType = "building"^^xsd:string) .
}}
} GROUP BY ?building ?buildingValue ?buildingFoi ?day ?weatherProperty
}
}
Small data sample:
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix sosa: <http://www.w3.org/ns/sosa/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix ex: <http://www.example3.com/>
prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix geof: <http://www.opengis.net/def/function/geosparql/>
ex:buildingObservation-1 a sosa:Observation ;
sosa:hasFeatureOfInterest ex:building-1 ;
sosa:resultTime "2014-5-1T00:00:00"^^xsd:dateTime ;
sosa:hasSimpleResult "9"^^xsd:double ;
ex:observationType "building"^^xsd:string .
ex:buildingObservation-2 a sosa:Observation ;
sosa:hasFeatureOfInterest ex:building-0 ;
sosa:resultTime "2013-5-1T00:00:00"^^xsd:dateTime ;
sosa:hasSimpleResult "9"^^xsd:double ;
ex:observationType "building"^^xsd:string .
ex:weatherObservation-1 a sosa:Observation ;
sosa:hasFeatureOfInterest ex:weatherStation-0 ;
sosa:resultTime "2013-5-1T00:00:00"^^xsd:dateTime ;
sosa:hasProperty "9"^^xsd:string ;
sosa:hasSimpleResult "111"^^xsd:double ;
ex:observationType "weather"^^xsd:string .
ex:weatherObservation-2 a sosa:Observation ;
sosa:hasFeatureOfInterest ex:weatherStation-1 ;
sosa:resultTime "2013-5-1T00:00:00"^^xsd:dateTime ;
sosa:hasProperty "9"^^xsd:string ;
sosa:hasSimpleResult "222"^^xsd:double ;
ex:observationType "weather"^^xsd:string .
ex:weatherObservation-3 a sosa:Observation ;
sosa:hasFeatureOfInterest ex:weatherStation-0 ;
sosa:resultTime "2014-5-1T00:00:00"^^xsd:dateTime ;
sosa:hasProperty "9"^^xsd:string ;
sosa:hasSimpleResult "333"^^xsd:double ;
ex:observationType "weather"^^xsd:string .
ex:weatherObservation-4 a sosa:Observation ;
sosa:hasFeatureOfInterest ex:weatherStation-1 ;
sosa:resultTime "2014-5-1T00:00:00"^^xsd:dateTime ;
sosa:hasProperty "9"^^xsd:string ;
sosa:hasSimpleResult "444"^^xsd:double ;
ex:observationType "weather"^^xsd:string .
ex:weatherStation-0 a sosa:Sensor ;
ex:geoType "weather"^^xsd:string ;
geo:asWKT "<http://www.opengis.net/def/crs/EPSG/0/4326> POINT(81.6799480003782 29.058143278701607)"^^geo:wktLiteral .
ex:weatherStation-1 a sosa:Sensor ;
ex:geoType "weather"^^xsd:string ;
geo:asWKT "<http://www.opengis.net/def/crs/EPSG/0/4326> POINT(-45.66468600537044 -89.41492295285423)"^^geo:wktLiteral .
ex:building-0 a sosa:Sensor ;
ex:geoType "building"^^xsd:string ;
geo:asWKT "<http://www.opengis.net/def/crs/EPSG/0/4326> POINT(28.801376420865992 20.748063782888806)"^^geo:wktLiteral .
ex:building-1 a sosa:Sensor ;
ex:geoType "building"^^xsd:string ;
geo:asWKT "<http://www.opengis.net/def/crs/EPSG/0/4326> POINT(21.114236666465075 -98.00059121036644)"^^geo:wktLiteral .
ex:weatherStation-0 ex:hasDistanceToBuilding [
ex:distance "17567"^^xsd:decimal ;
ex:building ex:building-0 ] .
ex:weatherStation-0 ex:hasDistanceToBuilding [
ex:distance "44236"^^xsd:decimal ;
ex:building ex:building-1 ] .
ex:weatherStation-1 ex:hasDistanceToBuilding [
ex:distance "500"^^xsd:decimal ;
ex:building ex:building-0 ] .
ex:weatherStation-1 ex:hasDistanceToBuilding [
ex:distance "400"^^xsd:decimal ;
ex:building ex:building-1 ] .
Query result:
building | buildingValue | buildingFoi | day | minDist | weatherProperty | weatherValue | weatherStationFoi |
---|---|---|---|---|---|---|---|
http://www.example3.com/buildingObservation-2 | 9.0 | http://www.example3.com/building-0 | 2013-5-1T00:00:00 | 500 | 9 | 222.0 | http://www.example3.com/weatherStation-1 |
http://www.example3.com/buildingObservation-1 | 9.0 | http://www.example3.com/building-1 | 2014-5-1T00:00:00 | 400 | 9 | 444.0 | http://www.example3.com/weatherStation-1 |
I'm also a little perplexed about a query (using similar data as above) which performs much better by querying for smaller parts of the result (1/6th to be exact) rather than attempting to retrieve the full result all at once. Neither query runs into any memory issues (i.e. swapping) and both appear to run on a single CPU core. I'd assume execution time would be similar but querying for smaller parts finishes all 6 queries in about 1 minute whereas executing the full query (querying for all 6 parts at once) executes for 15-30 minutes without any result (I gave up before letting the query finish).
The following python code will execute 6 SPARQL queries (1-6) with each query taking approximately 10 seconds to return. In this case, ?o1
is associated with weather stations (6 million weather station observations * 7 triples per observation ~~ 42 million triples) and ?bObs
is a set of 30 or so features of interest (sosa:hasFeatureOfInterest ?bObsFoi
) associated with a single building project (i.e. project ID: FILTER(?bPID = "CA8"^^xsd:string)
) . The 6 separate queries relate to unique weather observations associated with each ?bObs
observation, with one query mdae for each of the past 6 days. The query results are merged together into a single pandas dataframe (i.e. mimicking what would be produced by the single full query).
res = None
prefixes = '...'
for i in range(1, 7):
n = str(i)
print(n)
s = prefixes + '''
SELECT ?bObs ?bObsFoi ?AvgF ?CountObs ?y ?m ?d
GROUP_CONCAT(?nProperty1, ",") AS ?np_minus'''+n+'''
GROUP_CONCAT(?nResult1, ",") AS ?nr_minus'''+n+'''
GROUP_CONCAT(?nTime1, ",") AS ?nt_minus'''+n+'''
WHERE {
?bObs a sosa:Observation ;
sosa:hasFeatureOfInterest ?bObsFoi ;
sosa:resultTime ?bObsDate ;
sosa:hasSimpleResult ?AvgF ;
ex:countObservations ?CountObs ;
time:year ?y ;
time:month ?m ;
time:day ?d ;
ex:n-'''+n+'''DayObs ?o1 .
?bObsFoi ex:BProjectID ?bPID .
FILTER(?bPID = "CA8"^^xsd:string) .
?o1 a sosa:Observation ;
sosa:resultTime ?nTime1 ;
sosa:observedProperty ?nProperty1 ;
sosa:hasSimpleResult ?nResult1 .
}
'''
r = run_sparql(s) # executes SPARQL query and parses result into pandas DF
if i == 1:
res = r
else:
res = res.merge(r.drop(columns=['bObsFoi', 'AvgF', 'CountObs', 'y', 'm', 'd']),
how='left', left_on='bObs', right_on='bObs')
And the result of the merged queries:
bObs | bObsFoi | AvgF | CountObs | y | m | d | np_minus1 | nr_minus1 | nmd_minus1 | nt_minus1 | np_minus2 | nr_minus2 | nmd_minus2 | nt_minus2 | np_minus3 | nr_minus3 | nmd_minus3 | nt_minus3 | np_minus4 | nr_minus4 | nmd_minus4 | nt_minus4 | np_minus5 | nr_minus5 | nmd_minus5 | nt_minus5 | np_minus6 | nr_minus6 | nmd_minus6 | nt_minus6 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
http://www.example3.org/BSamplingFOI-CAb_W21-2016-8-2-avg-daily-F | http://www.example3.org/BSamplingFOI-CAb_W21 | 1.658657763991483 | 3 | 2016 | 8 | 2 | DEWP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT,MAX,MIN,MXSPD,PRCP | 64.0,1013.1,11.2, 73.4, 9.8, 4.7,000000, 81.0, 69.1, 11.1, 0.00 | 2016-08-01 00:00:00,2016-08-01 00:00:00,2016-08-01 00:00:00,2016-08-01 00:00:00,2016-08-01 00:00:00,2016-08-01 00:00... | DEWP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT,MAX,MIN,MXSPD,PRCP | 65.0,1011.6,9.8, 73.3, 9.8, 5.9,000000, 82.0, 69.1, 14.0, 0.00 | 2016-07-31 00:00:00,2016-07-31 00:00:00,2016-07-31 00:00:00,2016-07-31 00:00:00,2016-07-31 00:00:00,2016-07-31 00:00... | DEWP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT,MAX,MIN,MXSPD,PRCP | 65.9,1009.9,8.1, 73.9, 9.9, 6.4,000000, 82.0, 69.1, 13.0, 0.00 | 2016-07-30 00:00:00,2016-07-30 00:00:00,2016-07-30 00:00:00,2016-07-30 00:00:00,2016-07-30 00:00:00,2016-07-30 00:00... | DEWP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT,MAX,MIN,MXSPD,PRCP | 65.7,1010.1,8, 76.1, 8.6, 5.8,000000, 80.1, 70.0, 8.9, 0.00 | 2016-07-29 00:00:00,2016-07-29 00:00:00,2016-07-29 00:00:00,2016-07-29 00:00:00,2016-07-29 00:00:00,2016-07-29 00:00... | DEWP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT,MAX,MIN,MXSPD,PRCP | 64.5,1011.7,9.9, 74.7, 9.8, 5.2,000000, 87.1, 69.1, 8.9, 0.00 | 2016-07-28 00:00:00,2016-07-28 00:00:00,2016-07-28 00:00:00,2016-07-28 00:00:00,2016-07-28 00:00:00,2016-07-28 00:00... | DEWP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT,MAX,MIN,MXSPD,PRCP | 63.6,1011.7,9.9, 76.1, 9.9, 4.3,000000, 87.1, 70.0, 12.0, 0.00 | 2016-07-27 00:00:00,2016-07-27 00:00:00,2016-07-27 00:00:00,2016-07-27 00:00:00,2016-07-27 00:00:00,2016-07-27 00:00... | ||||||
http://www.example3.org/BSamplingFOI-CAb_W24E-2021-1-19-avg-daily-F | http://www.example3.org/BSamplingFOI-CAb_W24E | 0.087639213830691 | 3 | 2021 | 1 | 19 | DEWP,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT | 38.2, 84.0, 50.0, 8.9, 0.00,1011.2,9.2, 61.9, 10.0, 2.3,000000 | 2021-01-18 00:00:00,2021-01-18 00:00:00,2021-01-18 00:00:00,2021-01-18 00:00:00,2021-01-18 00:00:00,2021-01-18 00:00... | DEWP,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT | 38.1, 84.0, 52.0, 8.0, 0.00,1014.7,12.8, 65.3, 10.0, 2.2,000000 | 2021-01-17 00:00:00,2021-01-17 00:00:00,2021-01-17 00:00:00,2021-01-17 00:00:00,2021-01-17 00:00:00,2021-01-17 00:00... | DEWP,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT | 39.2, 90.0, 52.0, 8.9, 0.00,1014.9,13, 65.6, 10.0, 2.4,000000 | 2021-01-16 00:00:00,2021-01-16 00:00:00,2021-01-16 00:00:00,2021-01-16 00:00:00,2021-01-16 00:00:00,2021-01-16 00:00... | DEWP,GUST,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT | 27.3, 20.0, 90.0, 66.9, 15.0, 0.00,1019.4,17.6, 79.2, 10.0, 7.1,000000 | 2021-01-15 00:00:00,2021-01-15 00:00:00,2021-01-15 00:00:00,2021-01-15 00:00:00,2021-01-15 00:00:00,2021-01-15 00:00... | DEWP,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT | 39.4, 78.1, 46.0, 8.9, 0.00,1020.4,18.4, 60.5, 9.9, 1.7,000000 | 2021-01-14 00:00:00,2021-01-14 00:00:00,2021-01-14 00:00:00,2021-01-14 00:00:00,2021-01-14 00:00:00,2021-01-14 00:00... | DEWP,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT | 40.5, 75.9, 46.0, 7.0, 0.00,1023.4,21.4, 58.1, 10.0, 1.7,000000 | 2021-01-13 00:00:00,2021-01-13 00:00:00,2021-01-13 00:00:00,2021-01-13 00:00:00,2021-01-13 00:00:00,2021-01-13 00:00... | ||||||
http://www.example3.org/BSamplingFOI-CAb_W15-2015-4-14-avg-daily-F | http://www.example3.org/BSamplingFOI-CAb_W15 | 0.082863664642535 | 3 | 2015 | 4 | 14 | DEWP,FRSHTT,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP | 53.6,000000, 72.0, 55.0, 9.9, 0.00,1017,15.1, 64.1, 9.8, 3.2 | 2015-04-13 00:00:00,2015-04-13 00:00:00,2015-04-13 00:00:00,2015-04-13 00:00:00,2015-04-13 00:00:00,2015-04-13 00:00... | DEWP,FRSHTT,GUST,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP | 50.9,000000, 15.0, 71.1, 53.1, 11.1, 0.00,1013.2,11.3, 63.0, 10.0, 4.4 | 2015-04-12 00:00:00,2015-04-12 00:00:00,2015-04-12 00:00:00,2015-04-12 00:00:00,2015-04-12 00:00:00,2015-04-12 00:00... | DEWP,FRSHTT,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP | 49.0,000000, 70.0, 53.1, 11.1, 0.00,1014.4,12.4, 61.7, 10.0, 4.0 | 2015-04-11 00:00:00,2015-04-11 00:00:00,2015-04-11 00:00:00,2015-04-11 00:00:00,2015-04-11 00:00:00,2015-04-11 00:00... | DEWP,FRSHTT,GUST,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP | 47.3,000000, 15.0, 70.0, 52.0, 8.9, 0.00,1015.2,13.2, 60.8, 10.0, 3.9 | 2015-04-10 00:00:00,2015-04-10 00:00:00,2015-04-10 00:00:00,2015-04-10 00:00:00,2015-04-10 00:00:00,2015-04-10 00:00... | DEWP,FRSHTT,GUST,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP | 44.5,000000, 15.0, 70.0, 51.1, 8.9, 0.00,1014.7,12.7, 60.9, 10.0, 3.2 | 2015-04-09 00:00:00,2015-04-09 00:00:00,2015-04-09 00:00:00,2015-04-09 00:00:00,2015-04-09 00:00:00,2015-04-09 00:00... | DEWP,FRSHTT,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP | 44.4,010000, 66.9, 51.1, 11.1, 0.01,1017.1,15.1, 59.1, 9.8, 4.2 | 2015-04-08 00:00:00,2015-04-08 00:00:00,2015-04-08 00:00:00,2015-04-08 00:00:00,2015-04-08 00:00:00,2015-04-08 00:00... | ||||||
http://www.example3.org/BSamplingFOI-CAb_W15-2021-12-6-avg-daily-F | http://www.example3.org/BSamplingFOI-CAb_W15 | 0.665480665597645 | 3 | 2021 | 12 | 6 | DEWP,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT | 49.6, 66.9, 52.0, 5.1, 0.00,1019.7,17.7, 58.1, 4.0, 1.6,000000 | 2021-12-05 00:00:00,2021-12-05 00:00:00,2021-12-05 00:00:00,2021-12-05 00:00:00,2021-12-05 00:00:00,2021-12-05 00:00... | DEWP,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT | 49.3, 66.0, 55.9, 6.0, 0.00,1019.9,18.1, 59.3, 5.1, 2.4,000000 | 2021-12-04 00:00:00,2021-12-04 00:00:00,2021-12-04 00:00:00,2021-12-04 00:00:00,2021-12-04 00:00:00,2021-12-04 00:00... | DEWP,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT | 49.8, 64.9, 55.0, 6.0, 0.00,1019.2,17.3, 59.6, 5.1, 2.2,000000 | 2021-12-03 00:00:00,2021-12-03 00:00:00,2021-12-03 00:00:00,2021-12-03 00:00:00,2021-12-03 00:00:00,2021-12-03 00:00... | DEWP,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT | 53.1, 75.0, 53.1, 8.0, 0.00,1019.2,17.1, 58.6, 2.1, 3.2,100000 | 2021-12-02 00:00:00,2021-12-02 00:00:00,2021-12-02 00:00:00,2021-12-02 00:00:00,2021-12-02 00:00:00,2021-12-02 00:00... | DEWP,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT | 52.3, 75.0, 53.1, 5.1, 0.00,1016.9,15.3, 60.6, 3.8, 1.0,100000 | 2021-12-01 00:00:00,2021-12-01 00:00:00,2021-12-01 00:00:00,2021-12-01 00:00:00,2021-12-01 00:00:00,2021-12-01 00:00... | DEWP,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP,FRSHTT | 53.2, 75.0, 52.0, 7.0, 0.00,1016.6,14.8, 60.9, 3.7, 1.8,100000 | 2021-11-30 00:00:00,2021-11-30 00:00:00,2021-11-30 00:00:00,2021-11-30 00:00:00,2021-11-30 00:00:00,2021-11-30 00:00... | ||||||
http://www.example3.org/BSamplingFOI-CAb_W09-2015-8-25-avg-daily-F | http://www.example3.org/BSamplingFOI-CAb_W09 | 0.068059462159249 | 3 | 2015 | 8 | 25 | DEWP,FRSHTT,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP | 64.8,000000, 81.0, 66.9, 9.9, 0.00,1013.6,11.8, 72.8, 9.5, 4.1 | 2015-08-24 00:00:00,2015-08-24 00:00:00,2015-08-24 00:00:00,2015-08-24 00:00:00,2015-08-24 00:00:00,2015-08-24 00:00... | DEWP,FRSHTT,GUST,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP | 64.4,000000, 14.0, 78.1, 66.2, 8.9, 0.00,1013.1,11.3, 71.9, 8.0, 3.2 | 2015-08-23 00:00:00,2015-08-23 00:00:00,2015-08-23 00:00:00,2015-08-23 00:00:00,2015-08-23 00:00:00,2015-08-23 00:00... | DEWP,FRSHTT,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP | 62.2,000000, 79.0, 69.1, 9.9, 0.00,1012.1,10.2, 72.2, 9.1, 5.2 | 2015-08-22 00:00:00,2015-08-22 00:00:00,2015-08-22 00:00:00,2015-08-22 00:00:00,2015-08-22 00:00:00,2015-08-22 00:00... | DEWP,FRSHTT,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP | 62.7,000000, 79.0, 69.1, 11.1, 0.00,1011.1,9.2, 71.9, 9.1, 4.4 | 2015-08-21 00:00:00,2015-08-21 00:00:00,2015-08-21 00:00:00,2015-08-21 00:00:00,2015-08-21 00:00:00,2015-08-21 00:00... | DEWP,FRSHTT,GUST,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP | 61.7,000000, 15.0, 79.0, 68.0, 8.9, 0.00,1011.4,9.6, 71.6, 10.0, 4.9 | 2015-08-20 00:00:00,2015-08-20 00:00:00,2015-08-20 00:00:00,2015-08-20 00:00:00,2015-08-20 00:00:00,2015-08-20 00:00... | DEWP,FRSHTT,MAX,MIN,MXSPD,PRCP,SLP,STP,TEMP,VISIB,WDSP | 62.5,000000, 78.1, 68.0, 8.9, 0.00,1012.4,10.5, 71.3, 9.6, 4.5 | 2015-08-19 00:00:00,2015-08-19 00:00:00,2015-08-19 00:00:00,2015-08-19 00:00:00,2015-08-19 00:00:00,2015-08-19 00:00... | ||||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
In contrast, the full query below is identical to above but includes data for all 6 days at once. This query processes for at least 15-30 minutes (haven't tried letting it run longer). Here each ex:n-[day]DayObs
triple is associated with a list of up to 15 or so weather observations, one for each weather property (e.g. TEMP, DEWP, etc.) that was taken on the associated past day.
SELECT ?bObs ?bObsFoi ?AvgF ?CountObs ?y ?m ?d
GROUP_CONCAT(?nProperty1, ",") AS ?np_minus1
GROUP_CONCAT(?nResult1, ",") AS ?nr_minus1
GROUP_CONCAT(?nTime1, ",") AS ?nt_minus1
# ...
GROUP_CONCAT(?nProperty6, ",") AS ?np_minus6
GROUP_CONCAT(?nResult6, ",") AS ?nr_minus6
GROUP_CONCAT(?nTime6, ",") AS ?nt_minus6
WHERE {
?bObs a sosa:Observation ;
sosa:hasFeatureOfInterest ?bObsFoi ;
sosa:resultTime ?bObsDate ;
sosa:hasSimpleResult ?AvgF ;
ex:countObservations ?CountObs ;
time:year ?y ;
time:month ?m ;
time:day ?d ;
ex:n-6DayObs ?o6 ;
ex:n-5DayObs ?o5 ;
ex:n-4DayObs ?o4 ;
ex:n-3DayObs ?o3 ;
ex:n-2DayObs ?o2 ;
ex:n-1DayObs ?o1 .
?bObsFoi ex:BProjectID ?bPID .
FILTER(?bPID = "CA8"^^xsd:string) .
?o1 a sosa:Observation ;
sosa:resultTime ?nTime1 ;
sosa:observedProperty ?nProperty1 ;
sosa:hasSimpleResult ?nResult1 .
?o2 a sosa:Observation ;
sosa:resultTime ?nTime2 ;
sosa:observedProperty ?nProperty2 ;
sosa:hasSimpleResult ?nResult2 .
?o3 a sosa:Observation ;
sosa:resultTime ?nTime3 ;
sosa:observedProperty ?nProperty3 ;
sosa:hasSimpleResult ?nResult3 .
?o4 a sosa:Observation ;
sosa:resultTime ?nTime4 ;
sosa:observedProperty ?nProperty4 ;
sosa:hasSimpleResult ?nResult4 .
?o5 a sosa:Observation ;
sosa:resultTime ?nTime5 ;
sosa:observedProperty ?nProperty5 ;
sosa:hasSimpleResult ?nResult5 .
?o6 a sosa:Observation ;
sosa:resultTime ?nTime6 ;
sosa:observedProperty ?nProperty6 ;
sosa:hasSimpleResult ?nResult6 .
}
EDIT: Oops! Just realized that this is actually using only a subset of the datasets! The datasets here are not the 50 million triples version but a subset including ~30 buildings, 2 weather stations, ~10k building observations, and ~120k weather observations. So perhaps 1 million triples in total. So this works on this smaller dataset but I haven't yet tried it with the full 50 million triples dataset.
EDIT 2: The long query still didn't finish after letting it run for an hour.
EDIT 3: ex:n-[day]DayObs
triples were added to ?bObs
entities to explicitly connect a weather observation to a building observation.
EDIT 4: Profile and explain for the full query:
76 msec 0% cpu, 87057 rnd -3 seq 49.8989% same seg 0.00229732% same pg
2 disk reads, 0 read ahead, 0.000108171% wait
Compilation: 36362 msec 0 reads 0% read 0 messages 0% clw
REPORT
VARCHAR
_______________________________________________________________________________
{
Precode:
0: DB.DBA.RDF_MAKE_OBJ_OF_TYPEDSQLVAL$27 := Call DB.DBA.RDF_MAKE_OBJ_OF_TYPEDSQLVAL (<c CA8>, IRI_ID"...string" , <DB_NULL>)
7: BReturn 0
Subquery 29
{
fork {
RDF_QUAD 4.4e+04 rows(s_1_88_t39.S$58, s_1_88_t39.O$57)
inlined P = IRI_ID"...hasSimpleResult"
RDF_QUAD 0.8 rows(s_1_88_t36.S$61)
inlined P = IRI_ID"...type" , S = s_1_88_t39.S$58 , O = IRI_ID"...Observation"
RDF_QUAD_POGS 7.1e-05 rows(s_1_88_t9.S$65, s_1_88_t9.O$64)
P = IRI_ID"...n-6DayObs" , O = cast$460
RDF_QUAD 7.1e-05 rows(s_1_88_t5.S$68)
inlined P = IRI_ID"...obsType" , S = s_1_88_t9.S$65
RDF_QUAD_POGS 0.8 rows(s_1_88_t0.S$71)
inlined P = IRI_ID"...type" , O = IRI_ID"...Observation" , S = s_1_88_t5.S$68
RDF_QUAD 1 rows(s_1_88_t1.O$75, s_1_88_t1.S$74)
inlined P = IRI_ID"...hasFeatureOfInterest" , S = s_1_88_t0.S$71
RDF_QUAD 1 rows(s_1_88_t2.S$78)
inlined P = IRI_ID"...resultTime" , S = k_s_1_88_t0.S$490
RDF_QUAD 7.1e-05 rows(s_1_88_t4.S$82, s_1_88_t4.O$81)
inlined P = IRI_ID"...countObservations" , S = k_s_1_88_t0.S$500
RDF_QUAD 1 rows(s_1_88_t3.S$86, s_1_88_t3.O$85)
inlined P = IRI_ID"...hasSimpleResult" , S = k_s_1_88_t0.S$511
RDF_QUAD 7.1e-05 rows(s_1_88_t10.S$90, s_1_88_t10.O$89)
inlined P = IRI_ID"...n-5DayObs" , S = k_s_1_88_t0.S$522
RDF_QUAD 1 rows(s_1_88_t8.S$94, s_1_88_t8.O$93)
inlined P = IRI_ID"...day" , S = k_s_1_88_t0.S$533
RDF_QUAD 7.1e-05 rows(s_1_88_t11.S$98, s_1_88_t11.O$97)
inlined P = IRI_ID"...n-4DayObs" , S = k_s_1_88_t0.S$544
RDF_QUAD 1 rows(s_1_88_t7.S$102, s_1_88_t7.O$101)
inlined P = IRI_ID"...month" , S = k_s_1_88_t0.S$555
RDF_QUAD 7.1e-05 rows(s_1_88_t12.O$106, s_1_88_t12.S$105)
inlined P = IRI_ID"...n-3DayObs" , S = k_s_1_88_t0.S$566
RDF_QUAD 7.1e-05 rows(s_1_88_t14.O$110, s_1_88_t14.S$109)
inlined P = IRI_ID"...n-1DayObs" , S = k_s_1_88_t0.S$577
RDF_QUAD 7.1e-05 rows(s_1_88_t13.S$114, s_1_88_t13.O$113)
inlined P = IRI_ID"...n-2DayObs" , S = k_s_1_88_t0.S$588
RDF_QUAD 1 rows(s_1_88_t37.S$118, s_1_88_t37.O$117)
inlined P = IRI_ID"...resultTime" , S = k_s_1_88_t9.O$599
RDF_QUAD 1 rows(s_1_88_t33.S$122, s_1_88_t33.O$121)
inlined P = IRI_ID"...resultTime" , S = k_s_1_88_t10.O$610
RDF_QUAD 1 rows(s_1_88_t29.S$126, s_1_88_t29.O$125)
inlined P = IRI_ID"...resultTime" , S = k_s_1_88_t11.O$621
RDF_QUAD 1 rows(s_1_88_t25.S$130, s_1_88_t25.O$129)
inlined P = IRI_ID"...resultTime" , S = k_s_1_88_t12.O$632
RDF_QUAD 1 rows(s_1_88_t21.S$134, s_1_88_t21.O$133)
inlined P = IRI_ID"...resultTime" , S = k_s_1_88_t13.O$643
RDF_QUAD 1 rows(s_1_88_t17.S$138, s_1_88_t17.O$137)
inlined P = IRI_ID"...resultTime" , S = k_s_1_88_t14.O$654
RDF_QUAD 1 rows(s_1_88_t35.S$142, s_1_88_t35.O$141)
inlined P = IRI_ID"...hasSimpleResult" , S = k_s_1_88_t10.O$665
RDF_QUAD 1 rows(s_1_88_t31.S$146, s_1_88_t31.O$145)
inlined P = IRI_ID"...hasSimpleResult" , S = k_s_1_88_t11.O$676
RDF_QUAD 1 rows(s_1_88_t27.S$150, s_1_88_t27.O$149)
inlined P = IRI_ID"...hasSimpleResult" , S = k_s_1_88_t12.O$687
RDF_QUAD 1 rows(s_1_88_t23.S$154, s_1_88_t23.O$153)
inlined P = IRI_ID"...hasSimpleResult" , S = k_s_1_88_t13.O$698
RDF_QUAD 1 rows(s_1_88_t19.S$158, s_1_88_t19.O$157)
inlined P = IRI_ID"...hasSimpleResult" , S = k_s_1_88_t14.O$709
RDF_QUAD 1 rows(s_1_88_t34.S$162, s_1_88_t34.O$161)
inlined P = IRI_ID"...observedProperty" , S = k_s_1_88_t10.O$720
RDF_QUAD 1 rows(s_1_88_t30.S$166, s_1_88_t30.O$165)
inlined P = IRI_ID"...observedProperty" , S = k_s_1_88_t11.O$731
RDF_QUAD 1 rows(s_1_88_t26.S$170, s_1_88_t26.O$169)
inlined P = IRI_ID"...observedProperty" , S = k_s_1_88_t12.O$742
RDF_QUAD 1 rows(s_1_88_t22.S$174, s_1_88_t22.O$173)
inlined P = IRI_ID"...observedProperty" , S = k_s_1_88_t13.O$753
RDF_QUAD 1 rows(s_1_88_t18.S$178, s_1_88_t18.O$177)
inlined P = IRI_ID"...observedProperty" , S = k_s_1_88_t14.O$764
RDF_QUAD_POGS 7.1e-05 rows()
inlined P = IRI_ID"...BProjectID" , O = k_DB.DBA.RDF_MAKE_OBJ_OF_TYPEDSQLVAL$775 , S = k_s_1_88_t1.O$779
RDF_QUAD 0.8 rows()
inlined P = IRI_ID"...type" , S = k_s_1_88_t10.O$786 , O = IRI_ID"...Observation"
RDF_QUAD 0.8 rows()
inlined P = IRI_ID"...type" , S = k_s_1_88_t11.O$795 , O = IRI_ID"...Observation"
RDF_QUAD 0.8 rows()
inlined P = IRI_ID"...type" , S = k_s_1_88_t12.O$804 , O = IRI_ID"...Observation"
RDF_QUAD 0.8 rows()
inlined P = IRI_ID"...type" , S = k_s_1_88_t13.O$813 , O = IRI_ID"...Observation"
RDF_QUAD 0.8 rows()
inlined P = IRI_ID"...type" , S = k_s_1_88_t14.O$822 , O = IRI_ID"...Observation"
RDF_QUAD 1 rows(s_1_88_t38.O$193)
inlined P = IRI_ID"...observedProperty" , S = k_s_1_88_t9.O$831
RDF_QUAD 1 rows(s_1_88_t6.O$196)
inlined P = IRI_ID"...year" , S = k_s_1_88_t0.S$841
Precode:
0: __ro2sq$198 := Call __ro2sq (s_1_88_t8.O$93)
5: __ro2sq$200 := Call __ro2sq (s_1_88_t7.O$101)
10: __ro2sq$202 := Call __ro2sq (s_1_88_t4.O$81)
15: __ro2sq$204 := Call __ro2sq (s_1_88_t3.O$85)
20: __ro2sq$206 := Call __ro2sq (s_1_88_t1.O$75)
25: __id2in$208 := Call __id2in (s_1_88_t0.S$71)
30: __ro2sq$210 := Call __ro2sq (s_1_88_t18.O$177)
35: __ro2sq$212 := Call __ro2sq (s_1_88_t19.O$157)
40: __ro2sq$214 := Call __ro2sq (s_1_88_t17.O$137)
45: __ro2sq$216 := Call __ro2sq (s_1_88_t22.O$173)
50: __ro2sq$218 := Call __ro2sq (s_1_88_t23.O$153)
55: __ro2sq$220 := Call __ro2sq (s_1_88_t21.O$133)
60: __ro2sq$222 := Call __ro2sq (s_1_88_t26.O$169)
65: __ro2sq$224 := Call __ro2sq (s_1_88_t27.O$149)
70: __ro2sq$226 := Call __ro2sq (s_1_88_t25.O$129)
75: __ro2sq$228 := Call __ro2sq (s_1_88_t30.O$165)
80: __ro2sq$230 := Call __ro2sq (s_1_88_t31.O$145)
85: __ro2sq$232 := Call __ro2sq (s_1_88_t29.O$125)
90: __ro2sq$234 := Call __ro2sq (s_1_88_t34.O$161)
95: __ro2sq$236 := Call __ro2sq (s_1_88_t35.O$141)
100: __ro2sq$238 := Call __ro2sq (s_1_88_t33.O$121)
105: __ro2sq$240 := Call __ro2sq (s_1_88_t38.O$193)
110: __ro2sq$242 := Call __ro2sq (s_1_88_t39.O$57)
115: __ro2sq$244 := Call __ro2sq (s_1_88_t37.O$117)
120: __ro2sq$246 := Call __ro2sq (s_1_88_t6.O$196)
125: BReturn 0
RDF_QUAD 4.4e+04 rows(s_1_88_t43.S$249)
inlined P = IRI_ID"...hasSimpleResult"
RDF_QUAD 1 rows(s_1_88_t41.S$252)
inlined P = IRI_ID"...resultTime" , S = s_1_88_t43.S$249
RDF_QUAD 1 rows(s_1_88_t42.S$255)
inlined P = IRI_ID"...observedProperty" , S = s_1_88_t41.S$252
RDF_QUAD 0.8 rows()
inlined P = IRI_ID"...type" , S = k_s_1_88_t41.S$869 , O = IRI_ID"...Observation"
Sort (s_1_88_t8.O$93, s_1_88_t7.O$101, s_1_88_t6.O$196, s_1_88_t4.O$81, s_1_88_t3.O$85, s_1_88_t1.O$75, s_1_88_t0.S$71) -> (gb_tmp$262 , gb_tmp$265 , gb_tmp$268 , gb_tmp$271 , gb_tmp$274 , gb_tmp$277 , gb_tmp$280 , gb_tmp$283 , gb_tmp$286 , gb_tmp$289 , gb_tmp$292 , gb_tmp$295 , gb_tmp$298 , gb_tmp$301 , gb_tmp$304 , gb_tmp$307 , gb_tmp$310 , gb_tmp$313 , __ro2sq$198, __ro2sq$200, __ro2sq$246, __ro2sq$202, __ro2sq$204, __ro2sq$206, __id2in$208)
user aggregate init
0: ua_ret$263 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$262 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$263 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$262 , __ro2sq$244, <c ,>)
7: BReturn 0
user aggregate init
0: ua_ret$266 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$265 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$266 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$265 , __ro2sq$242, <c ,>)
7: BReturn 0
user aggregate init
0: ua_ret$269 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$268 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$269 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$268 , __ro2sq$240, <c ,>)
7: BReturn 0
user aggregate init
0: ua_ret$272 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$271 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$272 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$271 , __ro2sq$238, <c ,>)
7: BReturn 0
user aggregate init
0: ua_ret$275 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$274 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$275 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$274 , __ro2sq$236, <c ,>)
7: BReturn 0
user aggregate init
0: ua_ret$278 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$277 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$278 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$277 , __ro2sq$234, <c ,>)
7: BReturn 0
user aggregate init
0: ua_ret$281 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$280 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$281 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$280 , __ro2sq$232, <c ,>)
7: BReturn 0
user aggregate init
0: ua_ret$284 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$283 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$284 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$283 , __ro2sq$230, <c ,>)
7: BReturn 0
user aggregate init
0: ua_ret$287 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$286 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$287 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$286 , __ro2sq$228, <c ,>)
7: BReturn 0
user aggregate init
0: ua_ret$290 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$289 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$290 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$289 , __ro2sq$226, <c ,>)
7: BReturn 0
user aggregate init
0: ua_ret$293 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$292 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$293 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$292 , __ro2sq$224, <c ,>)
7: BReturn 0
user aggregate init
0: ua_ret$296 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$295 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$296 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$295 , __ro2sq$222, <c ,>)
7: BReturn 0
user aggregate init
0: ua_ret$299 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$298 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$299 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$298 , __ro2sq$220, <c ,>)
7: BReturn 0
user aggregate init
0: ua_ret$302 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$301 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$302 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$301 , __ro2sq$218, <c ,>)
7: BReturn 0
user aggregate init
0: ua_ret$305 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$304 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$305 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$304 , __ro2sq$216, <c ,>)
7: BReturn 0
user aggregate init
0: ua_ret$308 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$307 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$308 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$307 , __ro2sq$214, <c ,>)
7: BReturn 0
user aggregate init
0: ua_ret$311 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$310 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$311 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$310 , __ro2sq$212, <c ,>)
7: BReturn 0
user aggregate init
0: ua_ret$314 := Call DB.DBA.GROUP_CONCAT_INIT (gb_tmp$313 )
7: BReturn 0
user aggregate accumulator
0: ua_ret$314 := Call DB.DBA.GROUP_CONCAT_ACC (gb_tmp$313 , __ro2sq$210, <c ,>)
7: BReturn 0
}
group by read node
(s_1_88_t8.O$93, s_1_88_t7.O$101, s_1_88_t6.O$196, s_1_88_t4.O$81, s_1_88_t3.O$85, s_1_88_t1.O$75, s_1_88_t0.S$71, aggregate$261, aggregate$264, aggregate$267, aggregate$270, aggregate$273, aggregate$276, aggregate$279, aggregate$282, aggregate$285, aggregate$288, aggregate$291, aggregate$294, aggregate$297, aggregate$300, aggregate$303, aggregate$306, aggregate$309, aggregate$312, __ro2sq$198, __ro2sq$200, __ro2sq$246, __ro2sq$202, __ro2sq$204, __ro2sq$206, __id2in$208)
After code:
0: DB.DBA.GROUP_CONCAT_FIN$331 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$312)
7: DB.DBA.GROUP_CONCAT_FIN$333 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$309)
14: DB.DBA.GROUP_CONCAT_FIN$335 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$306)
21: DB.DBA.GROUP_CONCAT_FIN$337 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$303)
28: DB.DBA.GROUP_CONCAT_FIN$339 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$300)
35: DB.DBA.GROUP_CONCAT_FIN$341 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$297)
42: DB.DBA.GROUP_CONCAT_FIN$343 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$294)
49: DB.DBA.GROUP_CONCAT_FIN$345 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$291)
56: DB.DBA.GROUP_CONCAT_FIN$347 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$288)
63: DB.DBA.GROUP_CONCAT_FIN$349 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$285)
70: DB.DBA.GROUP_CONCAT_FIN$351 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$282)
77: DB.DBA.GROUP_CONCAT_FIN$353 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$279)
84: DB.DBA.GROUP_CONCAT_FIN$355 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$276)
91: DB.DBA.GROUP_CONCAT_FIN$357 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$273)
98: DB.DBA.GROUP_CONCAT_FIN$359 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$270)
105: DB.DBA.GROUP_CONCAT_FIN$361 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$267)
112: DB.DBA.GROUP_CONCAT_FIN$363 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$264)
119: DB.DBA.GROUP_CONCAT_FIN$365 := Call DB.DBA.GROUP_CONCAT_FIN (aggregate$261)
126: bObs$30 := := artm __id2in$208
130: bObsFoi$31 := := artm __ro2sq$206
134: AvgF$32 := := artm __ro2sq$204
138: CountObs$33 := := artm __ro2sq$202
142: y$34 := := artm __ro2sq$246
146: m$35 := := artm __ro2sq$200
150: d$36 := := artm __ro2sq$198
154: np_minus1$37 := := artm DB.DBA.GROUP_CONCAT_FIN$331
158: nr_minus1$38 := := artm DB.DBA.GROUP_CONCAT_FIN$333
162: nt_minus1$39 := := artm DB.DBA.GROUP_CONCAT_FIN$335
166: np_minus2$40 := := artm DB.DBA.GROUP_CONCAT_FIN$337
170: nr_minus2$41 := := artm DB.DBA.GROUP_CONCAT_FIN$339
174: nt_minus2$42 := := artm DB.DBA.GROUP_CONCAT_FIN$341
178: np_minus3$43 := := artm DB.DBA.GROUP_CONCAT_FIN$343
182: nr_minus3$44 := := artm DB.DBA.GROUP_CONCAT_FIN$345
186: nt_minus3$45 := := artm DB.DBA.GROUP_CONCAT_FIN$347
190: np_minus4$46 := := artm DB.DBA.GROUP_CONCAT_FIN$349
194: nr_minus4$47 := := artm DB.DBA.GROUP_CONCAT_FIN$351
198: nt_minus4$48 := := artm DB.DBA.GROUP_CONCAT_FIN$353
202: np_minus5$49 := := artm DB.DBA.GROUP_CONCAT_FIN$355
206: nr_minus5$50 := := artm DB.DBA.GROUP_CONCAT_FIN$357
210: nt_minus5$51 := := artm DB.DBA.GROUP_CONCAT_FIN$359
214: np_minus6$52 := := artm DB.DBA.GROUP_CONCAT_FIN$361
218: nr_minus6$53 := := artm DB.DBA.GROUP_CONCAT_FIN$363
222: nt_minus6$54 := := artm DB.DBA.GROUP_CONCAT_FIN$365
226: BReturn 0
Subquery Select(bObs$30, bObsFoi$31, AvgF$32, CountObs$33, y$34, m$35, d$36, np_minus1$37, nr_minus1$38, nt_minus1$39, np_minus2$40, nr_minus2$41, nt_minus2$42, np_minus3$43, nr_minus3$44, nt_minus3$45, np_minus4$46, nr_minus4$47, nt_minus4$48, np_minus5$49, nr_minus5$50, nt_minus5$51, np_minus6$52, nr_minus6$53, nt_minus6$54)
}
After code:
0: bObs$374 := Call __ro2sq (bObs$30)
5: bObsFoi$376 := Call __ro2sq (bObsFoi$31)
10: AvgF$378 := Call __ro2sq (AvgF$32)
15: CountObs$380 := Call __ro2sq (CountObs$33)
20: y$382 := Call __ro2sq (y$34)
25: m$384 := Call __ro2sq (m$35)
30: d$386 := Call __ro2sq (d$36)
35: np_minus1$388 := Call __ro2sq (np_minus1$37)
40: nr_minus1$390 := Call __ro2sq (nr_minus1$38)
45: nt_minus1$392 := Call __ro2sq (nt_minus1$39)
50: np_minus2$394 := Call __ro2sq (np_minus2$40)
55: nr_minus2$396 := Call __ro2sq (nr_minus2$41)
60: nt_minus2$398 := Call __ro2sq (nt_minus2$42)
65: np_minus3$400 := Call __ro2sq (np_minus3$43)
70: nr_minus3$402 := Call __ro2sq (nr_minus3$44)
75: nt_minus3$404 := Call __ro2sq (nt_minus3$45)
80: np_minus4$406 := Call __ro2sq (np_minus4$46)
85: nr_minus4$408 := Call __ro2sq (nr_minus4$47)
90: nt_minus4$410 := Call __ro2sq (nt_minus4$48)
95: np_minus5$412 := Call __ro2sq (np_minus5$49)
100: nr_minus5$414 := Call __ro2sq (nr_minus5$50)
105: nt_minus5$416 := Call __ro2sq (nt_minus5$51)
110: np_minus6$418 := Call __ro2sq (np_minus6$52)
115: nr_minus6$420 := Call __ro2sq (nr_minus6$53)
120: nt_minus6$422 := Call __ro2sq (nt_minus6$54)
125: BReturn 0
Select (bObs$374, bObsFoi$376, AvgF$378, CountObs$380, y$382, m$384, d$386, np_minus1$388, nr_minus1$390, nt_minus1$392, np_minus2$394, nr_minus2$396, nt_minus2$398, np_minus3$400, nr_minus3$402, nt_minus3$404, np_minus4$406, nr_minus4$408, nt_minus4$410, np_minus5$412, nr_minus5$414, nt_minus5$416, np_minus6$418, nr_minus6$420, nt_minus6$422)
}
331 Rows. -- 582 msec.
We have downloaded and loaded your data into graph <urn:git1045:data>
in our public https://geosparql.demo.openlinksw.com/sparql instance, and can run the test query you provided directly against the SPARQL endpoint.
So we don't need the python code, as we can see the problems directly via the SPARQL endpoint, and are looking into this ...
Time period: 10 years. Location: State of CA.
Dataset 1: 300 NOAA weather stations (located in CA). Daily summary of up to 15 variables (e.g. avg air temperature, avg wind speed, max air temperature). Not every variable present per day per station. Roughly 6 million observations in total. Dataset 2: 600 buildings (also located in CA). Single daily observation per building. Most buildings make around 100 observations per year. Roughly 200,000 observations in total.
On average, each building has 50 weather stations within a 200KM radius.
Spatiotemporal merge: Merge nearest weather variable to a building on a given day up to a maximum distance of 200KM per variable.
Example: Building 1 has it's first daily observation on Jan. 4, 2000. The nearest weather station (1KM) has a daily summary of 10 variables. The next nearest weather station (2KM) has 3 other variables. One more weather variable is only at a weather station that is 100KM away. The final weather variable is not found at any station within 200KM.
Final result: The final result should ideally be one row per building observation with an attached column for each weather variable (or something like NaN if no weather variable is available within 200KM). (The ideal result would be outputting this final query e.g. in CSV format which could then be fed into a dask pandas dataframe for use in machine learning tasks.)
Challenges: It would be easy enough to match the nearest weather station to any given building as this is only 300 x 600 distinct distance comparisons in total (180,000). However, in the worst case, if all 15 variables were matched against the average 50 nearest weather stations for every building observation present (200k), that would be 15x50x200000 = 150,000,000 distance calculations.
Any thoughts?