koopjs / koop-opendata

ArcGIS Open Data provider for Koop (experimental).
Apache License 2.0
4 stars 2 forks source link

API should follow for GeoServices Query pattern #4

Open ajturner opened 9 years ago

ajturner commented 9 years ago

Currently the URL looks like http://koop.dc.esri.com/openData/umbrella/q/{:query}/FeatureServer/0

This is odd to register an entire Service that is a specific query. Instead it would be better to make it a common Service across the entire catalog and use query parameters.

http://koop.dc.esri.com/openData/umbrella/FeatureServer/0/query?where=q+LIKE+'crime'

Then this could also do

http://koop.dc.esri.com/openData/umbrella/FeatureServer/0/query?where=q+LIKE+'crime'&returnCountOnly=true or `http://koop.dc.esri.com/openData/umbrella/FeatureServer/0/query?where=q+LIKE+'crime'&outStatistics=[{outStatisticField: "tag", outStatisticCalc: "count", outStatisticName: "tag_count"}]

dmfenton commented 9 years ago

It's an interesting concept. It makes a lot of sense to make this provider more of a pass-through. I could probably steal a lot of code from koop-pgcache for parsing the where query.

Just to note though, you can (or should be able to) do something like: http://koop.dc.esri.com/openData/umbrella/FeatureServer/0/query?where=name+LIKE+'crime'&returnCountOnly=true and http://koop.dc.esri.com/openData/umbrella/FeatureServer/0/query?where=name+LIKE+'crime'&outStatistics=[{outStatisticField: "tag", outStatisticCalc: "count", outStatisticName: "tag_count"}] *Note: the tags aren't going to come out right. They're just brought in as a string since feature services don't handle arrays.

They just won't be efficient because of how Koop handles feature services right now.

However, the q like is a little weird. Is there anything closer to a concept for full-text-search in geoservices?

chelm commented 9 years ago
  1. we need to address the "efficiency" of feature services in koop. That can happen now that pgcache is ready to rock
  2. do we need to ever cache anything from the open data API? isnt just simply a pass through? What do we get by using koop's cache?
chelm commented 9 years ago

i guess using koop's cache might be nice for putting all of prod "umbrella" into it up front and then just using the DB to filter it and generate feature service responses.

SO why do that? When you register "umbrella", or any open site, just suck all the data into koop and then use where filters on it.

chelm commented 9 years ago

+1 on @ajturner suggestion for just registering the open data API instead of the specific query.

ajturner commented 9 years ago
  1. All Koop outputs should use this query parameters pattern
  2. The interface definition for a Service query should be abstracted and reusable as a model. Simply it could parse, validate and store a query structure that a provider would reuse. Other output API could parse their query parameters into this common internal model.
  3. All should be transparent to a cache. That should be a operations detail, not a code logic one
  4. Umbrella should be live. It's fast and small enough we don't need Yet Another Cache (YAC)
ajturner commented 9 years ago

However, the q like is a little weird. Is there anything closer to a concept for full-text-search in geoservices?

MapServices (but not FeatureServices - yay!) have an optional text= parameter. That is supposed to only be on the displayField but we could repurpose to be a search index query.

Description: A literal search text. If the layer has a display field associated with it, the server searches for this text in this field. This parameter is a short hand for a where clause of: where like '%%'. The text is case sensitive. This parameter is ignored if the where parameter is specified.

Example: text=Los

API Doc

dmfenton commented 9 years ago

MapServices (but not FeatureServices - yay!) have an optional text= parameter. That is supposed to only be on the displayField but we could repurpose to be a search index query.

@ajturner Perfect, I can make a pass-through work well with that.

SO why do that? When you register "umbrella", or any open site, just suck all the data into koop and then use where filters on it.

@chelm that's what happens when you go to http://koop.dc.esri.com/openData/umbrella. It sends a * search over and pages through until the whole index is in the DB

Umbrella should be live. It's fast and small enough we don't need Yet Another Cache (YAC)

  • Without any caching, we would lose existing facilities for generating arbitrary statistics and doing spatial queries that are not simple bboxes. We may still need something short-lived.
  • Umbrella is fast enough when they'e are 100 results or less. But keep in mind that each page takes about 2 seconds
ab -n 10 http://opendata.arcgis.com/datasets.json\?\=\*\&per_page\=100
This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking opendata.arcgis.com (be patient).....done

Server Software:        nginx/1.4.6
Server Hostname:        opendata.arcgis.com
Server Port:            80

Document Path:          /datasets.json?=*&per_page=100
Document Length:        527926 bytes

Concurrency Level:      1
Time taken for tests:   21.530 seconds
Complete requests:      10
Failed requests:        0
Total transferred:      5286070 bytes
HTML transferred:       5279260 bytes
Requests per second:    0.46 [#/sec] (mean)
Time per request:       2152.970 [ms] (mean)
Time per request:       2152.970 [ms] (mean, across all concurrent requests)
Transfer rate:          239.77 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       15   25  16.6     17      62
Processing:  1721 2128 257.0   2119    2536
Waiting:     1286 1597 229.7   1614    2117
Total:       1783 2153 247.1   2136    2553

Percentage of the requests served within a certain time (ms)
  50%   2136
  66%   2312
  75%   2379
  80%   2408
  90%   2553
  95%   2553
  98%   2553
  99%   2553
 100%   2553 (longest request)
  • The interface definition for a Service query should be abstracted and reusable as a model. Simply it could parse, validate and store a query structure that a provider would reuse. Other output API could parse their query parameters into this common internal model.
  • All should be transparent to a cache. That should be a operations detail, not a code logic one

@ajturner Not sure I understand your points here, can you share an example?

chelm commented 9 years ago

2 seconds for 100 json docs? not stoked on that...

dmfenton commented 9 years ago

See https://github.com/ArcGIS/composer/pull/6918