Open lucasrodes opened 1 year ago
I'm using wizard now for the first time, so I'll write here all issues I find along the way:
2023-09-01 08:37:55.275 WARNING streamlit.runtime.caching.cache_data_api: No runtime found, using MemoryCacheStorageManager
['streamlit', 'run', '/Users/prosado/Documents/owid/repos/etl/apps/wizard/app.py', '--server.port', '8053', '--', '--phase', 'all', '--run-checks']
FileNotFoundError: [Errno 2] No such file or directory: '/Users/prosado/Documents/owid/repos/etl/etl/steps/data/meadow/animal_welfare/2023-09-01/playground.ipynb'
Traceback:
File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
exec(code, module.__dict__)
File "/Users/prosado/Documents/owid/repos/etl/apps/wizard/templating/meadow.py", line 212, in <module>
os.remove(notebook_path)
dag
file should also be mentioned, if it has been modified.snapshot.metadata.license
was None, even if license
was defined in the .dvc
file. It took me a while to realise that the produced .dvc
file had license
indented too many spaces, hence becoming an item of of origin
. I think they are meant to be separate fields in the snapshot metadata. Once I removed the indent of license
(and its items), the issue was fixed. So, I suppose it's a bug, and wizard should not store license
inside origin
(we can discuss if that should be indeed the correct structure).citation_producer
should be expandable, given that it can be very long (as dataset description fields are).These issues should be solved in https://github.com/owid/etl/pull/1567
@Marigold @pabloarosado From our discussions and the documentation on Notion, I had assumed that we agreed that the field license
should be under $.meta.origin
.
However, from what Pablo is saying, it sounds like this is raising an error. As a solution for now, I've set wizard to export the field license
both at $.meta.origin.license
and $.meta.license
levels.
Do you think this is fine?
It took me a while to realise that the produced .dvc file had license indented too many spaces, hence becoming an item of of origin. I think they are meant to be separate fields in the snapshot metadata.
The original idea was to keep it under origin, not separate from it. license
is there only for backward compatibility. Ideally, snapshot meta would have either source & license
or origin & origin.license
(we could refactor old files if this causes too much confusion).
Thanks for clarifying, Mojmir. Yes, that's what I had in mind.
@pabloarosado could you share code to reproduce the error?
walkthrough bug that might be occurring here too: https://github.com/owid/etl/issues/1571
update_period_days
is mandatory in the guideline, but it is not available by default in any of the YAML filesTwo related comments to the ones above:
description_processing
is not shown by default when walkthrough garden generates the yaml, and it's a recommended fielddescription
is automatically generated in the yaml file, when I see it is not used anymoreIn general it would be good to match that file with the final version of the guidelines.
Hey! Should I just reopen for a small issue then?
- [x] The create playground notebook option in wizard doesn't generate a Jupyter Notebook
(copied from https://github.com/owid/etl/issues/1566#issuecomment-1723433161)
It's not super clear to me whether I should be hitting return after filling in each section or not? It seems like a good thing to somewhat validate each entry, but also the form completed unexpectedly early when pressing it, meaning I skipped the last couple of sections.
# NOTE: To learn more about the fields, hover over their names.
# Learn more about the available fields:
# http://localhost:8000/architecture/metadata/reference/dataset/
dataset:
update_period_days: 365
# Learn more about the available fields:
# http://localhost:8000/architecture/metadata/reference/tables/
Harmonize country names with the following command (assuming country field is called country). Check out a short demo of the tool
@spoonerf, good point, but unfortunately, clicking on ENTER is equivalent to submitting the entire form. It will be submitted if the form is valid (i.e., all required fields are present).
@pabloarosado
I think the options to add regions and population data available on walkthrough should also be available on wizard as well.
Which options do you refer to here? I just executed walkthrough
and there aren't - at least on my end - any options to add regions or population datasets.
@lucasrodes Maybe it was removed recently, but they were available as check boxes at the # end of walkthrough garden
etl/steps/data/channel/
variable.description_processing
should be printed in the dummy yaml in etl-wizard garden
step (to not forget to use it when necessary)etl-wizard snapshot
, in the License section, there is the ©producer year
option, but it prints ©producer year
and not ©{producer} {year}
like it's done in origin.attribution
Found here https://github.com/owid/etl/pull/1825#discussion_r1367034367
[ ] It would be nice to have the Select local file to import option
with the option to navigate the OS and select the file instead of writing a path that can be long
[x] When adding a year instead of a date to date_published
, it prints year
without quotes. I think it's with quotes to comply with the schema
UPDATE: the current recommended workflow is based on chart-diff. So won't be addressing this issues.
Something went wrong! (MySQLdb.IntegrityError) (1062, "Duplicate entry '1904-3-3-1' for key 'suggested_chart_revisions.chartId'") [SQL: INSERT INTO suggested_chart_revisions (chartId, createdBy, originalConfig, suggestedConfig, status, createdAt, updatedAt) VALUES (%s, %s, %s, %s, %s, %s, %s)] [parameters: (1904, 59, '{"id": 1904, "map": {"colorScale": {"baseColorScheme": "YlGn", "binningStrategy": "manual", "customNumericColors": [null, null, null, null, null, nul ... (2786 characters truncated) ... 71}], "sourceDesc": "The World Bank", "isPublished": true, "selectedEntityNames": ["Azerbaijan", "Antigua and Barbuda", "Rwanda", "Sudan", "Brazil"]}', '{"id": 1904, "map": {"colorScale": {"baseColorScheme": "YlGn", "binningStrategy": "manual", "customNumericColors": [null, null, null, null, null, nul ... (651 characters truncated) ... 47}], "sourceDesc": "The World Bank", "isPublished": true, "selectedEntityNames": ["Azerbaijan", "Antigua and Barbuda", "Rwanda", "Sudan", "Brazil"]}', 'pending', datetime.datetime(2023, 11, 1, 16, 19, 38, 280631), datetime.datetime(2023, 11, 1, 16, 19, 38, 280631))] (Background on this error at: https://sqlalche.me/e/14/gkpj)
I saved this draft chart to replicate (indicator here).
etl-wizard charts
don't actually work. They have the format http://localhost:8053/None/datasets/{dataset_id}/
etl-wizard charts
: when trying to migrate from [818136] $30 a day - Number not in poverty (Estimated)
to [818156] $30 a day - Number not in poverty (Smoothed)
I actually can't find the latter.I am using dataset [6341] uniquely (migrating in the same dataset).EDIT: Also with the migrations
[818125] $1.90 a day - Number in poverty (Estimated)
> [818144] $1.90 a day - Number in poverty (Smoothed)
[818142] $5-$10 - Number in poverty (Estimated)
> [818162] $5-$10 - Number in poverty (Smoothed)
Probably there's not much to do, but
I see that the csv here includes ..
for some data points.
@paarriagadap, thanks for reporting. This issue is, as you mention, because there are non-numeric values in the dataset. Currently, the explore mode only supports comparing numerical values. We could compare categorical ones in the future if we needed to.
However, in this case, I think the indicators should be numbers and ..
should be NaN
instead. That way, the indicator would be recognised as numeric. Also, why do we have ..
there? I bet that's a legacy dataset, and we won't be fixing this (removing ..
). Unfortunately, implementing some logic to replace ..
so that we can compare the data is too specific and don't think we should do it.
Chart revision:
I see that some classic issues with the chart revision tool still happen occasionally:
It's not urgent, and I have already manually fixed the affected cases. I just mentioned them here so we are aware that these minor things still happen.
I'll be looking at chart revisions this week, so I will take a look if there's an easy fix for that. (Examples like that are super useful by the way)
Issue raised by pablo might be related to this one: https://github.com/owid/etl/issues/867 (meant to post this here)
cc. @Marigold
Is the sorting in bar charts in the approval tool fixed here? Because I still find the issue:
:
it generates an error in the yaml file. It can be solved if the content is rendered in quotes.When using wizard snapshots, if I save the title with : it generates an error in the yaml file. It can be solved if the content is rendered in quotes.
Thanks for reporting @paarriagadap. I think this by design, otherwise the linter gets confused with the keyword. As you say, you can solve this by putting the text in quotes ("
) or using a multiline string (|-
, >
, etc.)
ValueError: Property variable.display.numDecimalPlaces has no type!
Traceback:
File "/home/owid/etl/.venv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 600, in _run_script
exec(code, module.__dict__)
File "/home/owid/etl/apps/wizard/pages/expert/app.py", line 13, in <module>
from apps.wizard.pages.expert.prompts import (
File "/home/owid/etl/apps/wizard/pages/expert/prompts.py", line 52, in <module>
{render_indicator()}
File "/home/owid/etl/etl/docs.py", line 161, in render_indicator
documentation = render_props_recursive(
File "/home/owid/etl/etl/docs.py", line 115, in render_props_recursive
text += render_props_recursive(
File "/home/owid/etl/etl/docs.py", line 115, in render_props_recursive
text += render_props_recursive(
File "/home/owid/etl/etl/docs.py", line 123, in render_props_recursive
text += render_prop_doc(prop, prop_name=prop_name, level=level)
File "/home/owid/etl/etl/docs.py", line 65, in render_prop_doc
raise ValueError(f"Property {prop_name} has no type!")
@paarriagadap fixed the expert, thanks for reporting!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Indicator Upgrader didn't recognize the version of a dataset updated this year: PR
Indicator Upgrader didn't recognize the version of a dataset updated this year: https://github.com/owid/etl/pull/3569
@pabloarosado, in case you want/have time to look into the detection algorithm
Hi @paarriagadap thanks for reporting this, I've fixed it in your branch. There were two issues:
dag/archive/poverty_inequality.yml
was created, but it wasn't added to dag/archive/main.yml
, and therefore it was ignored.data://explorers/lis/latest/luxembourg_income_study
data://explorers/wid/latest/world_inequality_database
If you think this was caused by wizard (which would be a bug), let me know. Thanks!Hi @pabloarosado. Ah, thanks for the fix. I should have forgotten about adding the dag file in the archive main.yml
Regarding those latest
steps, yes, I think that when I use the tool to update steps, these latest steps are moved to the archive and disappear from the "live" yaml. At least I have seen that I need to re-add those steps to the latter.
Hi @pabloarosado. Ah, thanks for the fix. I should have forgotten about adding the dag file in the archive main.yml
Regarding those
latest
steps, yes, I think that when I use the tool to update steps, these latest steps are moved to the archive and disappear from the "live" yaml. At least I have seen that I need to re-add those steps to the latter.
I guess this could have been caused by the other issue, of not having the dag/archive/poverty_inequality.yml
file properly accounted for. But please let me know if something like this happens again in the future.
After #1539 you can now use the new tool
wizard
to generate templates for your ETL steps.Use this issue to report bugs that you may encounter. If the issue is very complex, feel free to create a separate - and more extense - issue.