koopjs / koop-socrata

Socrata provider for Koop.
Other
8 stars 7 forks source link

Koop-Socrata handles 1mm row datasets #31

Closed dmfenton closed 8 years ago

dmfenton commented 9 years ago

Concept

As a user I can access a performant feature service that is proxying: http://data.seattle.gov/resource/3k2p-39jp.json

Details

cc @astauffer

chelm commented 9 years ago

In dealing with pre-cached data in any provider its really less of an issue for the provider than it is an issue with koop-pgcache and optimizations there (ie query performance / optimization). Just saying there are two sides to explore, the provider, which you done via the PR and the caching and retrieval of large data.

dmfenton commented 9 years ago

Want me to make an issue in Koop-pgcache?

chelm commented 9 years ago

@dmfenton if it makes sense, but I think first I'd like to understand what the short comings of the current solution are in terms of benchmarks or just timed request numbers and whether its a cache issue or a provider issue or even a koop issue.

Can you provide references to a URL in koop that exhibits behavior that is "slow"?

dmfenton commented 9 years ago
ab -n 5 'http://koop.dc.esri.com/socrata/seattle/3k2p-39jp/FeatureServer/0/query?where=1=1&outFields=*&returnGeometry=true'
This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking koop.dc.esri.com (be patient)...apr_pollset_poll: The timeout specified has expired (70007)
ab -n 5 'http://koop.dc.esri.com/socrata/seattle/3k2p-39jp/FeatureServer/0/query?where=1=1&returnCountOnly=true'
This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking koop.dc.esri.com (be patient)...apr_pollset_poll: The timeout specified has expired (70007)
ab -n 5 'http://koop.dc.esri.com/socrata/seattle/3k2p-39jp/FeatureServer/0'
This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking koop.dc.esri.com (be patient).....done

Server Software:
Server Hostname:        koop.dc.esri.com
Server Port:            80

Document Path:          /socrata/seattle/3k2p-39jp/FeatureServer/0
Document Length:        0 bytes

Concurrency Level:      1
Time taken for tests:   8.516 seconds
Complete requests:      5
Failed requests:        0
Non-2xx responses:      5
Total transferred:      494 bytes
HTML transferred:       0 bytes
Requests per second:    0.59 [#/sec] (mean)
Time per request:       1703.296 [ms] (mean)
Time per request:       1703.296 [ms] (mean, across all concurrent requests)
Transfer rate:          0.06 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        6    6   0.1      6       6
Processing:     6 1697 3783.2      6    8465
Waiting:        6 1697 3783.2      6    8465
Total:         11 1703 3783.2     11    8471
chelm commented 9 years ago

@dmfenton so explain what happened here?

In the last test is returned in 1.7 seconds, so it was less than your 2 limit? :)

dmfenton commented 9 years ago

The basic FeatureServer request, which doesn't need to return any data is taking 1.7 seconds. Seems pretty long.

The other requests timeout even when I set the limit to 30 seconds.

dmfenton commented 8 years ago

This works but performance could be a lot better. Still, closing.