owid / etl

A compute graph for loading and transforming OWID's data
https://docs.owid.io/projects/etl
MIT License
62 stars 18 forks source link

wizard: wizard charts fails with large datasets #1403

Closed spoonerf closed 7 months ago

spoonerf commented 11 months ago

One-liner

When running walkthrough charts on 5593 -> 6161 it takes a long time refreshing when selecting new variables to match. I have also found it's crashed twice when using it, at different points, once when I was half way through 😢 . It gives the error below:

Screenshot 2023-08-01 at 14 47 37

I've used it successfully on other datasets, this time it kind of felt like it was struggling with the size?

Environment

Marigold commented 11 months ago

@lucasrodes do you have the capacity or should I take a look?

spoonerf commented 11 months ago

I was thinking maybe I don't need to use the UI - is it possible to run chart revisions without it?

lucasrodes commented 11 months ago

Hi! I remember that happened to me as well, with large datasets, the tool could be more stable.

@Marigold I am a bit low on time and unsure why get_connection raises an error. Most of the code for walkthrough charts is in walkthrough/charts_v2.

@spoonerf I think you could use the terminal version in the meantime.

  1. First need to create the variable mapping dictionary. For this, you can use the command line tool etl-match-variables.
  2. Need to create the charts and submit to admin. For this, use the command line etl-chart-suggester.

If you encounter any problems with these issues, let me know by raising another issue!

spoonerf commented 11 months ago

@lucasrodes With the etl-match-variables could it be possible that the function checks that -output-file is valid before the mappings are done? Just got burnt by using a directory instead of a file 😭

lucasrodes commented 11 months ago

Gotcha, @spoonerf. Opened https://github.com/owid/etl/pull/1423

pabloarosado commented 9 months ago

I'm having the same problem with the Global Carbon Budget (both local and on my staging server). First, I had the issue that one of the charts configs didn't have "id" (on chart 7115). I manually added it (I thought this issue was already fixed). But then, I restart the etl-wizard, and when submitting chart revisions, I get the error:

Something went wrong! (MySQLdb.OperationalError) (1048, "Column 'suggestedVersion' cannot be null") [SQL: INSERT INTO suggested_chart_revisions (chartId, createdBy, originalConfig, suggestedConfig, status, createdAt, updatedAt) VALUES (%s, %s, %s, %s, %s, %s, %s)] [parameters: ((483, 58, '{"id": 483, "map": {"time": 1980, "colorScale": {"baseColorScheme": "BuGn", "binningStrategy": "equalInterval", "legendDescription": "", "customNumer ... (1239 characters truncated) ... rldindata.org/co2-dataset-sources", "text": "CO\u2082 data: sources, methods and FAQs"}], "hideRelativeToggle": false, "matchingEntitiesOnly": true}', '{"id": 483, "map": {"time": 1980, "colorScale": {"baseColorScheme": "BuGn", "binningStrategy": "equalInterval", "legendDescription": "", "customNumer ... (1239 characters truncated) ... rldindata.org/co2-dataset-sources", "text": "CO\u2082 data: sources, methods and FAQs"}], "hideRelativeToggle": false, "matchingEntitiesOnly": true}', 'pending', datetime.datetime(2023, 10, 6, 14, 47, 42, 894155), datetime.datetime(2023, 10, 6, 14, 47, 42, 894159)), (485, 58, '{"id": 485, "map": {"colorScale": {"baseColorScheme": "OrRd", "binningStrategy": "manual", "legendDescription": "Cumulative CO\u2082 emissions", "cu ... (4458 characters truncated) ... Kingdom", "European Union (27)", "India", "Brazil", "Canada"], "hideAnnotationFieldsInTitle": {"time": true, "entity": true, "changeInPrefix": true}}', '{"id": 485, "map": {"colorScale": {"baseColorScheme": "OrRd", "binningStrategy": "manual", "legendDescription": "Cumulative CO\u2082 emissions", "cu ... (1075 characters truncated) ... Kingdom", "European Union (27)", "India", "Brazil", "Canada"], "hideAnnotationFieldsInTitle": {"time": true, "entity": true, "changeInPrefix": true}}', 'pending', datetime.datetime(2023, 10, 6, 14, 47, 42, 898040), datetime.datetime(2023, 10, 6, 14, 47, 42, 898041)), (486, 58, '{"id": 486, "map": {"colorScale": {"baseColorScheme": "YlOrBr", "binningStrategy": "manual", "legendDescription": "CO\u2082 emissions per capita", " ... (4404 characters truncated) ... n (27)", "India", "China", "South Africa", "Canada", "Kenya"], "hideAnnotationFieldsInTitle": {"time": true, "entity": true, "changeInPrefix": true}}', '{"id": 486, "map": {"colorScale": {"baseColorScheme": "YlOrBr", "binningStrategy": "manual", "legendDescription": "CO\u2082 emissions per capita", " ... (1058 characters truncated) ... n (27)", "India", "China", "South Africa", "Canada", "Kenya"], "hideAnnotationFieldsInTitle": {"time": true, "entity": true, "changeInPrefix": true}}', 'pending', datetime.datetime(2023, 10, 6, 14, 47, 42, 901792), datetime.datetime(2023, 10, 6, 14, 47, 42, 901793)), (488, 58, '{"id": 488, "map": {"colorScale": {"baseColorScheme": "Reds", "binningStrategy": "manual", "legendDescription": "Annual CO\u2082 emissions", "custom ... (1072 characters truncated) ... ed Kingdom", "India", "China", "France", "Germany", "Brazil"], "hideAnnotationFieldsInTitle": {"time": true, "entity": true, "changeInPrefix": true}}', '{"id": 488, "map": {"colorScale": {"baseColorScheme": "Reds", "binningStrategy": "manual", "legendDescription": "Annual CO\u2082 emissions", "custom ... (1072 characters truncated) ... ed Kingdom", "India", "China", "France", "Germany", "Brazil"], "hideAnnotationFieldsInTitle": {"time": true, "entity": true, "changeInPrefix": true}}', 'pending', datetime.datetime(2023, 10, 6, 14, 47, 42, 903758), datetime.datetime(2023, 10, 6, 14, 47, 42, 903759)), (530, 58, '{"id": 530, "map": {"colorScale": {"baseColorScheme": "OrRd", "binningStrategy": "manual", "legendDescription": "CO2 intensity (kgCO2 per unit GDP)", ... (1203 characters truncated) ... ["United States", "China", "India", "Indonesia", "Pakistan"], "hideAnnotationFieldsInTitle": {"time": true, "entity": true, "changeInPrefix": true}}', '{"id": 530, "map": {"colorScale": {"baseColorScheme": "OrRd", "binningStrategy": "manual", "legendDescription": "CO2 intensity (kgCO2 per unit GDP)", ... (1203 characters truncated) ... ["United States", "China", "India", "Indonesia", "Pakistan"], "hideAnnotationFieldsInTitle": {"time": true, "entity": true, "changeInPrefix": true}}', 'pending', datetime.datetime(2023, 10, 6, 14, 47, 42, 905610), datetime.datetime(2023, 10, 6, 14, 47, 42, 905611)), (535, 58, '{"id": 535, "map": {"time": 1980, "colorScale": {"baseColorScheme": "BuGn", "binningStrategy": "equalInterval", "legendDescription": ""}, "columnSlug ... (4858 characters truncated) ... mes": ["World"], "selectedEntityColors": {"World": "#818282"}, "hideAnnotationFieldsInTitle": {"time": true, "entity": true, "changeInPrefix": true}}', '{"id": 535, "map": {"time": 1980, "colorScale": {"baseColorScheme": "BuGn", "binningStrategy": "equalInterval", "legendDescription": ""}, "columnSlug ... (1132 characters truncated) ... mes": ["World"], "selectedEntityColors": {"World": "#818282"}, "hideAnnotationFieldsInTitle": {"time": true, "entity": true, "changeInPrefix": true}}', 'pending', datetime.datetime(2023, 10, 6, 14, 47, 42, 909101), datetime.datetime(2023, 10, 6, 14, 47, 42, 909101)), (547, 58, '{"id": 547, "map": {"time": 1980, "colorScale": {"baseColorScheme": "BuGn", "binningStrategy": "equalInterval", "legendDescription": ""}, "columnSlug ... (987 characters truncated) ... (excl. EU-27)"], "selectedEntityColors": {"EU-27": "#4c6a9c"}, "hideAnnotationFieldsInTitle": {"time": true, "entity": true, "changeInPrefix": true}}', '{"id": 547, "map": {"time": 1980, "colorScale": {"baseColorScheme": "BuGn", "binningStrategy": "equalInterval", "legendDescription": ""}, "columnSlug ... (987 characters truncated) ... (excl. EU-27)"], "selectedEntityColors": {"EU-27": "#4c6a9c"}, "hideAnnotationFieldsInTitle": {"time": true, "entity": true, "changeInPrefix": true}}', 'pending', datetime.datetime(2023, 10, 6, 14, 47, 42, 910954), datetime.datetime(2023, 10, 6, 14, 47, 42, 910955)), (560, 58, '{"id": 560, "map": {"time": 2021, "colorScale": {"baseColorScheme": "BuGn", "binningStrategy": "equalInterval", "legendDescription": ""}, "columnSlug ... (1215 characters truncated) ... r", "variableId": 123}], "hideLegend": true, "isPublished": true, "timelineMaxTime": 2020, "hideRelativeToggle": false, "matchingEntitiesOnly": true}', '{"id": 560, "map": {"time": 2021, "colorScale": {"baseColorScheme": "BuGn", "binningStrategy": "equalInterval", "legendDescription": ""}, "columnSlug ... (1215 characters truncated) ... r", "variableId": 123}], "hideLegend": true, "isPublished": true, "timelineMaxTime": 2020, "hideRelativeToggle": false, "matchingEntitiesOnly": true}', 'pending', datetime.datetime(2023, 10, 6, 14, 47, 42, 912704), datetime.datetime(2023, 10, 6, 14, 47, 42, 912706)) ... displaying 10 of 99 total bound parameter sets ... (7114, 58, '{"id": 7114, "map": {"colorScale": {"baseColorScheme": "Oranges", "binningStrategy": "manual", "customNumericValues": [0.1, 0.25, 0.5, 1, 2.5, 5, 10, ... (2428 characters truncated) ... sions": [{"property": "y", "variableId": 738071}], "isPublished": true, "selectedEntityNames": ["United States", "United Kingdom", "China", "India"]}', '{"id": 7114, "map": {"colorScale": {"baseColorScheme": "Oranges", "binningStrategy": "manual", "customNumericValues": [0.1, 0.25, 0.5, 1, 2.5, 5, 10, ... (544 characters truncated) ... sions": [{"property": "y", "variableId": 814707}], "isPublished": true, "selectedEntityNames": ["United States", "United Kingdom", "China", "India"]}', 'pending', datetime.datetime(2023, 10, 6, 14, 47, 43, 129984), datetime.datetime(2023, 10, 6, 14, 47, 43, 129985)), (7115, 58, '{"id": 7115, "data": {"availableEntities": ["Afghanistan", "Africa", "Africa (UN)", "Akrotiri and Dhekelia", "Albania", "Algeria", "American Samoa", ... (4907 characters truncated) ... e": true, "selectedEntityNames": ["High-income countries", "Upper-middle-income countries", "Lower-middle-income countries", "Low-income countries"]}', '{"id": 7115, "note": "Data is for emissions from [fossil fuels and industry](http://localhost:8053/Charts#dod:fossilemissions). Land use change is not included.", "slug": "share ... (840 characters truncated) ... e": true, "selectedEntityNames": ["High-income countries", "Upper-middle-income countries", "Lower-middle-income countries", "Low-income countries"]}', 'pending', datetime.datetime(2023, 10, 6, 14, 47, 43, 132760), datetime.datetime(2023, 10, 6, 14, 47, 43, 132761)))] (Background on this error at: https://sqlalche.me/e/14/e3q8)

I just manually executed etl-match-variables and etl-chart-suggester and I get a similar error:

2023-10-06 14:56:35 [error    ] (MySQLdb.OperationalError) (1048, "Column 'suggestedVersion' cannot be null")

I suppose I have to trim the mappin into half and execute them one by one.

pabloarosado commented 9 months ago

For the record, I had the same issue related to "suggestedVersion" on local grapher and on live grapher. It turned out that one specific mapping was causing it, namely

"738071": "815075"

After removing that one, all chart revisions were properly submitted. So maybe there's something strange about one of those variables.

Marigold commented 7 months ago

Might be fixed by https://github.com/owid/etl/pull/2069

pabloarosado commented 7 months ago

I was having the same issues mentioned above and now, after Mojmir's changes, they seem to be fixed. So I think we can close this, and maybe reopen it if the same issue comes back in another dataset.