Closed keithfraley closed 6 years ago
Hi @keithfraley - thanks for reaching out. I'm glad you brought this up. You're correct - the data is being reduced by either the limit
value (if set in request) or the maxRecordCount
. Stats should not be calculated like this, so I think you've found a bug. In the short term, you can work around this by setting the maxRecordCount
metadata property in your provider to a value that will be equal to or larger than your full record set.
But I think we need to change Winnow so that aggregate queries do calculations over the full record set delivered from the provider. I will add an issue to the Winnow repo.
Here's the Winnow issue: https://github.com/koopjs/winnow/issues/96
Thanks for the quick response Rich, the challenge is that when working with datasets much larger than a reasonable max record count I dont see that you will have much choice other than to go through the provider.
Perhaps one enhancement would be the option to calc field stats at a provider level?
I understand what you are saying. I think you can do this in your provider without any additional change in the koop ecosystem, but it will take a little development. You will need to intercept the outStatistics
query parameter in your getData
function and use it to calculate your own statistics there and use the results as the geojson passed to the callback. You will need to delete req.query.outStatistics
prior to said callback as well so that FeatureServer/Winnow don't try to calculate statistics after you have already done so.
Thanks! I will have a look
On Wed, Aug 15, 2018 at 8:24 AM Rich Gwozdz notifications@github.com wrote:
I understand what you are saying. I think you can do this in your provider without any additional change in the koop ecosystem, but it will take a little development. You will need to intercept the outStatistics query parameter in your getData function and use it to calculate your own statistics there and use the results as the geojson passed to the callback. You will need to delete req.query.outStatistics prior to said callback as well so that FeatureServer/Winnow don't try to calculate statistics after you have already done so.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/koopjs/koop/issues/329#issuecomment-413196222, or mute the thread https://github.com/notifications/unsubscribe-auth/ADyxVD6OElhmU-R4cDPhAXxI7fs09NDKks5uRCEXgaJpZM4V9gwI .
After a deeper dive, I have determined that there isn't any bug here, at least using the providers I am currently using. You can add a limit
or a resultRecordCount
to the request and change its value, but I find the outStatistics
are calculated over the entire record set delivered to koop from the provider. Adding limit
or a resultRecordCount
does add a LIMIT
to the SQL query in winnow, but like standard SQL, the LIMIT
fragment doesn't appear to affect calculations.
In using the elasticsearch provider I noticed that values brought on return are only the stats of the first set of results. If the limit is set to 1000 records returned then you are only get the stats of that 1000 records.
My question is this, is that an issue that is handled at the provider level or at the fs level? How can I modify the query on the backend to get stats on all records?