owid / etl

A compute graph for loading and transforming OWID's data
https://docs.owid.io/projects/etl
MIT License
79 stars 21 forks source link

wizard: Wizard snapshot cannot handle citation full with line breaks #1752

Closed pabloarosado closed 1 year ago

pabloarosado commented 1 year ago

One-liner

Wizard snapshot cannot handle citation full with line breaks.

Context & details

The .dvc file that is created when the citation_full entered through the browser contains multiple paragraphs is not properly indented. This issue probably affects other fields, not just citation_full.

Example error:

Traceback (most recent call last):
  File "/Users/prosado/Documents/owid/repos/etl/snapshots/emissions/2023-10-10/net_zero_tracker.py", line 24, in <module>
    main()
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/snapshots/emissions/2023-10-10/net_zero_tracker.py", line 17, in main
    snap = Snapshot(f"emissions/{SNAPSHOT_VERSION}/net_zero_tracker.xlsx")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/etl/snapshot.py", line 64, in __init__
    self.metadata = SnapshotMeta.load_from_yaml(self.metadata_path)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/etl/snapshot.py", line 259, in load_from_yaml
    yml = yaml.safe_load(istream)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/yaml/__init__.py", line 125, in safe_load
    return load(stream, SafeLoader)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/yaml/__init__.py", line 81, in load
    return loader.get_single_data()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/yaml/constructor.py", line 49, in get_single_data
    node = self.get_single_node()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/yaml/composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/yaml/composer.py", line 133, in compose_mapping_node
    item_value = self.compose_node(node, item_key)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/yaml/composer.py", line 133, in compose_mapping_node
    item_value = self.compose_node(node, item_key)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/yaml/composer.py", line 127, in compose_mapping_node
    while not self.check_event(MappingEndEvent):
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
                         ^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/yaml/parser.py", line 428, in parse_block_mapping_key
    if self.check_token(KeyToken):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/yaml/scanner.py", line 115, in check_token
    while self.need_more_tokens():
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/yaml/scanner.py", line 152, in need_more_tokens
    self.stale_possible_simple_keys()
  File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/yaml/scanner.py", line 291, in stale_possible_simple_keys
    raise ScannerError("while scanning a simple key", key.mark,
yaml.scanner.ScannerError: while scanning a simple key
  in "/Users/prosado/Documents/owid/repos/etl/snapshots/emissions/2023-10-10/net_zero_tracker.xlsx.dvc", line 15, column 1
could not find expected ':'
  in "/Users/prosado/Documents/owid/repos/etl/snapshots/emissions/2023-10-10/net_zero_tracker.xlsx.dvc", line 17, column 16

The .dvc file was populated as follows:

    citation_full: |-
      Net Zero Tracker. Energy and Climate Intelligence Unit, Data-Driven EnviroLab, NewClimate Institute, Oxford Net Zero. 2023.
John Lang, Camilla Hyslop, Zhi Yi Yeo, Richard Black, Peter Chalkley, Thomas Hale, Frederic Hans, Nick Hay, Niklas Höhne, Angel Hsu, Takeshi Kuramochi, Silke Mooldijk, Steve Smith.

While the correct formatting would be:

    citation_full: |-
      Net Zero Tracker. Energy and Climate Intelligence Unit, Data-Driven EnviroLab, NewClimate Institute, Oxford Net Zero. 2023.
      John Lang, Camilla Hyslop, Zhi Yi Yeo, Richard Black, Peter Chalkley, Thomas Hale, Frederic Hans, Nick Hay, Niklas Höhne, Angel Hsu, Takeshi Kuramochi, Silke Mooldijk, Steve Smith.
    attribution: Energy and Climate Intelligence Unit, Data-Driven EnviroLab, NewClimate
lucasrodes commented 1 year ago

hi @pabloarosado, Can you provide your example? snapshot / how-to-reproduce?

pabloarosado commented 1 year ago

hi @pabloarosado, Can you provide your example? snapshot / how-to-reproduce?

I couldn't give an example since the code wasn't yet pushed. Here's the dvc file that failed (after correcting citation_full).

To reproduce, run etl-wizard --dummy-data and then, in citation_full, add a line break. For example:

Dummy description for a dummy snapshot.
Another sentence.

This will generate a broken dvc file.