openlibraryenvironment / reshare-analytics

Community query repository for ReShare analytics based on Metadb/LDP
Apache License 2.0
3 stars 2 forks source link

"Division by zero" error in core requesting and core supplying (+ possibly other) scripts when date limited #38

Closed julienye closed 1 year ago

julienye commented 2 years ago

I generated monthly reports for PALCI and CNY for individual months in 2021. Most queries failed to execute with a "division by zero" error, which I was able to work around by skipping the ratio calculations in the script, and doing those calculations later, offline.

I believe the error occurs if a library has 0 requests or 0 supply requests during the specified time period. All report queries must be able to handle this case. Suggestion: either return "N/A" or 0 for the calculated ratio, when the denominator is 0.

scwiek commented 2 years ago

@julienye Are you able to directly run queries against the databases?

julienye commented 2 years ago

Yes, using DBeaver. The "division by zero" error didn't specify where the error occurred, but when I removed the ratio-calculation code from the script, it generated the counts without error.

On Mon, Jan 24, 2022 at 3:40 PM scwiek @.***> wrote:

@julienye https://github.com/julienye Are you able to directly run queries against the databases?

— Reply to this email directly, view it on GitHub https://github.com/openlibraryenvironment/reshare-analytics/issues/38#issuecomment-1020527659, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVUFOPD6UFKKL72U56RHAKLUXW2MRANCNFSM5MWMAW6A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

nassibnassar commented 2 years ago

Can you post an SQL query that shows the error?

scwiek commented 2 years ago

Assuming September is one of the bad months, give this a try.

WITH parameters AS ( SELECT '2021-09-01'::date AS start_date, '2021-10-01'::date AS end_date ) SELECT * FROM ( SELECT rs_requester AS requester, sum( CASE WHEN rs_to_status = 'REQ_VALIDATED' THEN 1 ELSE 0 END) AS reqs, sum( CASE WHEN rs_to_status = 'REQ_CANCELLED' THEN 1 ELSE 0 END) AS cancelled, sum( CASE WHEN rs_to_status = 'REQ_END_OF_ROTA' THEN 1 ELSE 0 END) AS unfilled, sum( CASE WHEN (rs_from_status = 'REQ_SHIPPED' AND rs_to_status = 'REQ_CHECKED_IN') OR rs_to_status = 'REQ_FILLED_LOCALLY' THEN 1 ELSE 0 END) AS received, round(coalesce((sum( CASE WHEN (rs_from_status = 'REQ_SHIPPED' AND rs_to_status = 'REQ_CHECKED_IN') THEN 1 ELSE 0 END) / nullif (cast(sum( CASE WHEN rs_to_status = 'REQ_VALIDATED' THEN 1 ELSE 0 END) AS decimal), 0)), 0), 2) AS filled_ratio FROM reshare_derived.req_stats WHERE rs_date_created >= ( SELECT start_date FROM parameters) AND rs_date_created < ( SELECT end_date FROM parameters) GROUP BY requester UNION SELECT 'consortium' AS requester, sum( CASE WHEN rs_to_status = 'REQ_VALIDATED' THEN 1 ELSE 0 END) AS reqs, sum( CASE WHEN rs_to_status = 'REQ_CANCELLED' THEN 1 ELSE 0 END) AS cancelled, sum( CASE WHEN rs_to_status = 'REQ_END_OF_ROTA' THEN 1 ELSE 0 END) AS unfilled, sum( CASE WHEN (rs_from_status = 'REQ_SHIPPED' AND rs_to_status = 'REQ_CHECKED_IN') OR rs_to_status = 'REQ_FILLED_LOCALLY' THEN 1 ELSE 0 END) AS received, round(coalesce((sum( CASE WHEN (rs_from_status = 'REQ_SHIPPED' AND rs_to_status = 'REQ_CHECKED_IN') THEN 1 ELSE 0 END) / nullif (cast(sum( CASE WHEN rs_to_status = 'REQ_VALIDATED' THEN 1 ELSE 0 END) AS decimal), 0)), 0), 2) AS filled_ratio FROM reshare_derived.req_stats WHERE rs_date_created >= ( SELECT start_date FROM parameters) AND rs_date_created < ( SELECT end_date FROM parameters) GROUP BY requester) AS core_req ORDER BY ( CASE WHEN core_req.requester = 'consortium' THEN 0 ELSE 1 END), core_req.requester;

julienye commented 2 years ago

As it happens, September is not one of the bad months, but October is, so substitute this for the beginning of the query scwiek just posted:

WITH parameters AS ( SELECT '2021-10-01'::date AS start_date, '2021-10-31'::date AS end_date ) SELECT * ...

Member Gettysburg has 0 requests in October...

On Mon, Jan 24, 2022 at 3:59 PM scwiek @.***> wrote:

Assuming September is one of the bad months, give this a try.

WITH parameters AS ( SELECT '2021-09-01'::date AS start_date, '2021-10-01'::date AS end_date ) SELECT * FROM ( SELECT rs_requester AS requester, sum( CASE WHEN rs_to_status = 'REQ_VALIDATED' THEN 1 ELSE 0 END) AS reqs, sum( CASE WHEN rs_to_status = 'REQ_CANCELLED' THEN 1 ELSE 0 END) AS cancelled, sum( CASE WHEN rs_to_status = 'REQ_END_OF_ROTA' THEN 1 ELSE 0 END) AS unfilled, sum( CASE WHEN (rs_from_status = 'REQ_SHIPPED' AND rs_to_status = 'REQ_CHECKED_IN') OR rs_to_status = 'REQ_FILLED_LOCALLY' THEN 1 ELSE 0 END) AS received, round(coalesce((sum( CASE WHEN (rs_from_status = 'REQ_SHIPPED' AND rs_to_status = 'REQ_CHECKED_IN') THEN 1 ELSE 0 END) / nullif (cast(sum( CASE WHEN rs_to_status = 'REQ_VALIDATED' THEN 1 ELSE 0 END) AS decimal), 0)), 0), 2) AS filled_ratio FROM reshare_derived.req_stats WHERE rs_date_created >= ( SELECT start_date FROM parameters) AND rs_date_created < ( SELECT end_date FROM parameters) GROUP BY requester UNION SELECT 'consortium' AS requester, sum( CASE WHEN rs_to_status = 'REQ_VALIDATED' THEN 1 ELSE 0 END) AS reqs, sum( CASE WHEN rs_to_status = 'REQ_CANCELLED' THEN 1 ELSE 0 END) AS cancelled, sum( CASE WHEN rs_to_status = 'REQ_END_OF_ROTA' THEN 1 ELSE 0 END) AS unfilled, sum( CASE WHEN (rs_from_status = 'REQ_SHIPPED' AND rs_to_status = 'REQ_CHECKED_IN') OR rs_to_status = 'REQ_FILLED_LOCALLY' THEN 1 ELSE 0 END) AS received, round(coalesce((sum( CASE WHEN (rs_from_status = 'REQ_SHIPPED' AND rs_to_status = 'REQ_CHECKED_IN') THEN 1 ELSE 0 END) / nullif (cast(sum( CASE WHEN rs_to_status = 'REQ_VALIDATED' THEN 1 ELSE 0 END) AS decimal), 0)), 0), 2) AS filled_ratio FROM reshare_derived.req_stats WHERE rs_date_created >= ( SELECT start_date FROM parameters) AND rs_date_created < ( SELECT end_date FROM parameters) GROUP BY requester) AS core_req ORDER BY ( CASE WHEN core_req.requester = 'consortium' THEN 0 ELSE 1 END), core_req.requester;

— Reply to this email directly, view it on GitHub https://github.com/openlibraryenvironment/reshare-analytics/issues/38#issuecomment-1020540582, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVUFOPHMEHUL5W4VUIYBQ5TUXW4SPANCNFSM5MWMAW6A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

scwiek commented 2 years ago

@julienye Did the query I posted (with your date adjustments) still produce the "division by zero" error?

julienye commented 2 years ago

oh, sorry! I didn't realize you wanted me to try it! Your revised query does not produce the "division by zero" error. However, it did omit Gettysburg (which had all zero counts) from the results. I think we'd prefer to see them included, with zeroes, for ease of comparison with other months - if that's feasible.

I also ran it against November data, when Widener had 0 requests but 1 received.
No error, and the results were what I'd expect to see:

requester | reqs | cancelled | unfilled | received | filled_ratio widener | 0 | 0 | 0 | 1 | 0

Thanks!

scwiek commented 2 years ago

Thanks @julienye, that's helpful information.

julienye commented 2 years ago

@scwiek I had the opportunity to review reports with @debradenault yesterday, and she confirmed what I alluded to in an earlier comment. It appears that when an institution has 0 requests in whatever date range the report is run, that institution is omitted from the results. Debra stated that all institutions should appear in the results list, e.g., to facilitate comparison with other months, even if they have no data for the selected time period.

Can you update the core requesting and core supplying scripts so that 0-request rows are not dropped, instead supplying 0 for all columns and calculations?

As a reminder, October 2021 is a good month for testing core_requesting @ PALCI: Gettysburg had 0 requests. And November 2021 is good for testing core_supplying @ PALCI: Misericordia had 0 supply requests.

Thanks!

scwiek commented 2 years ago

@julienye I might have a fix to add the empty institutions to the core reports. Once #42 is merged in by Nassib and pushed into the live systems, can you run this query against the CNY and/or PALCI systems to see if it's displaying as desired?

julienye commented 2 years ago

Sure - happy to test whenever!

On Mon, Feb 21, 2022 at 4:06 PM scwiek @.***> wrote:

@julienye https://github.com/julienye I might have a fix to add the empty institutions to the core reports. Once #42 https://github.com/openlibraryenvironment/reshare-analytics/pull/42 is merged in by Nassib and pushed into the live systems, can you run this query https://github.com/scwiek/reshare-analytics/blob/empty-test/sql/report_queries/core_requesting/core_req.sql against the CNY and/or PALCI systems to see if it's displaying as desired?

— Reply to this email directly, view it on GitHub https://github.com/openlibraryenvironment/reshare-analytics/issues/38#issuecomment-1047228186, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVUFOPEXMTTQYQZ7LQZO7RDU4KSODANCNFSM5MWMAW6A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>