nodestream-proj / docs

The Nodestream Project Github Site
1 stars 1 forks source link

Document extracting data from text files that unformatted text #20

Open yasonk opened 3 months ago

yasonk commented 3 months ago

For the following pipeline:

- implementation: nodestream.pipeline.extractors:FileExtractor
  arguments:
    globs:
    - data/nodes.txt
- implementation: nodestream.interpreting:Interpreter
  arguments:
    interpretations:
    - type: source_node
      node_type: MyNdoe
      key:
        node_name: !regex
          regex: '^(?P<node_name>.*)'
          data: !jmespath 'line'
          group: node_name

Because the file ends with .txt, it will be extracted into an object that has the property named "line". However, this feature is not documented.