koopjs / koop

Transform, query, and download geospatial data on the web.
http://koopjs.github.io
Other
659 stars 127 forks source link

field statistic question #329

Closed keithfraley closed 6 years ago

keithfraley commented 6 years ago

In using the elasticsearch provider I noticed that values brought on return are only the stats of the first set of results. If the limit is set to 1000 records returned then you are only get the stats of that 1000 records.

My question is this, is that an issue that is handled at the provider level or at the fs level? How can I modify the query on the backend to get stats on all records?

rgwozdz commented 6 years ago

Hi @keithfraley - thanks for reaching out. I'm glad you brought this up. You're correct - the data is being reduced by either the limit value (if set in request) or the maxRecordCount. Stats should not be calculated like this, so I think you've found a bug. In the short term, you can work around this by setting the maxRecordCount metadata property in your provider to a value that will be equal to or larger than your full record set.

But I think we need to change Winnow so that aggregate queries do calculations over the full record set delivered from the provider. I will add an issue to the Winnow repo.

rgwozdz commented 6 years ago

Here's the Winnow issue: https://github.com/koopjs/winnow/issues/96

keithfraley commented 6 years ago

Thanks for the quick response Rich, the challenge is that when working with datasets much larger than a reasonable max record count I dont see that you will have much choice other than to go through the provider.

Perhaps one enhancement would be the option to calc field stats at a provider level?

rgwozdz commented 6 years ago

I understand what you are saying. I think you can do this in your provider without any additional change in the koop ecosystem, but it will take a little development. You will need to intercept the outStatistics query parameter in your getData function and use it to calculate your own statistics there and use the results as the geojson passed to the callback. You will need to delete req.query.outStatistics prior to said callback as well so that FeatureServer/Winnow don't try to calculate statistics after you have already done so.

keithfraley commented 6 years ago

Thanks! I will have a look

On Wed, Aug 15, 2018 at 8:24 AM Rich Gwozdz notifications@github.com wrote:

I understand what you are saying. I think you can do this in your provider without any additional change in the koop ecosystem, but it will take a little development. You will need to intercept the outStatistics query parameter in your getData function and use it to calculate your own statistics there and use the results as the geojson passed to the callback. You will need to delete req.query.outStatistics prior to said callback as well so that FeatureServer/Winnow don't try to calculate statistics after you have already done so.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/koopjs/koop/issues/329#issuecomment-413196222, or mute the thread https://github.com/notifications/unsubscribe-auth/ADyxVD6OElhmU-R4cDPhAXxI7fs09NDKks5uRCEXgaJpZM4V9gwI .

rgwozdz commented 6 years ago

After a deeper dive, I have determined that there isn't any bug here, at least using the providers I am currently using. You can add a limit or a resultRecordCount to the request and change its value, but I find the outStatistics are calculated over the entire record set delivered to koop from the provider. Adding limit or a resultRecordCount does add a LIMIT to the SQL query in winnow, but like standard SQL, the LIMIT fragment doesn't appear to affect calculations.