iati-data-access / data-backend

GNU Affero General Public License v3.0
1 stars 0 forks source link

Missing data for EC INTPA (and potentially others) #24

Closed simon-20 closed 8 months ago

simon-20 commented 9 months ago

There is data missing from the dashboard for European Commission - International Partnerships. see here: https://countrydata.iatistandard.org/data/reporting-organisation/XI-IATI-EC_INTPA/?filters=transaction_type%3A3,4,budget%3Byear%3A2022

There should be data for this dashboard. For example, this file is a portion of the CDFD Excel file which can be downloaded from the CDFD front page for Ethiopia. It contains all the EC INTPA data for Ethiopia, and there is data in that file for 2022, which should be appearing on the dashboard screen. ec-international-partnerships-ethiopia.xlsx

The logs for the overnight runs of CDFD show no errors in processing the EC INTPA data, and the downloadable Excel files contain the data, so it is something to do with the final stage where the data is inserted in the database.

simon-20 commented 8 months ago

Results of interim investigation: there is something breaking on dev and production which is resulting in nothing being saved into the database for EC INTPA:

iatidatacube=> select sum(value_usd) from iati_line where reporting_organisation = 'XI-IATI-EC_INTPA' and calendar_year = '2022' and (transaction_type = '3' or transaction_type = '4') and recipient_country_or_region = 'ET'; 
 sum 
-----

(1 row - empty result)

If I run the backend tool locally, and process just the ec-intpa dataset, then it works fine with no errors:

select sum(value_usd) from iati_line where reporting_organisation = 'XI-IATI-EC_INTPA' and calendar_year = '2022' and (transaction_type = '3' or transaction_type = '4') and recipient_country_or_region = 'ET'; 

Result:
213973353.50020015

That figure is correct, and matches a figure hand calculated using the relevant spreadsheet outputs.

simon-20 commented 8 months ago

This bug occurs in the flask update stage of the overnight backend processing. When it encounters XML data which refers to a non-existent publisher in reporting_org, it (at least sometimes) throws an exception. This exception is not caught, and so the program crashes, which means all as-yet unprocessed data is left unprocessed.

I plan to issue a hotfix for this which allow the program to continue to run when it encounters this error. I will then look into the root cause of the problem: it's not simply that this exception was uncaused, because there is a prior check for the existence of reporting_org.

simon-20 commented 8 months ago

The root cause of the problem is that there is a foreign key constraint on the reporting_organisation_type column in the iatiline table; this is keyed into the organisation_type table, which is pulled down from the official codelist.

However, if there is bad data in the file--i.e., an invalid codelist value--then the check which is meant to catch this fails, and thus throws the exception.

There is a second exception being thrown that may be a similar, but independent problem. That will be investigated after fixing this first problem.

simon-20 commented 8 months ago

This successfully ran on production overnight last night, so think this issue is now fixed. This issue is the same as Issue #23.

simon-20 commented 8 months ago

Fixed by commit https://github.com/iati-data-access/data-backend/commit/bf9cdb7a9b0739039ad98d1ec3011c830dd51f5a