Excessive memory use loading large yaml stream?

Bascy commented 1 year ago

We are developing a very large project base on ESP32 and currently in the process of switching from json based config file to YAML format because of readability. We are using a YAML file from around 9000 characters.

The YAML file is read from SPIFFS and provided as a Stream, The code that parses the YAML file, allocates a DynamicJsonDocument of 20000 characters and then calls deserializeYml(jsonDocument, stream). We also have ArduinoJson in our project. Right before calling deserializeYml() the ESP.getMaxAllocHeap() returns around 77.000 and still deserializing results in a "Not enough memory" error.

I've tried enabling debug or verbose level logging, but that doesn't give me any more logging then the "Not enough memory" error message

Why does it take more than 77kB to parse a 10.000 byte stream? I would think that te purpose of a stream is to not have all the data in memory at once

If I comment out the "currentcontoller" map at the bottom of the yaml file, the parsing does work ...

sensors:
  Main_Vref:
    id: 11
    nodes:
      main:
        type: ADCVoltage
        terminalId: 17
        pollingInterval: 10000
        r1: 36000
        r2: 10000

  OutsideTemp:
    id: 13
    nodes:
      iomodule:
        type: ADCThermistor
        terminalId: 44
        pollingInterval: 30000
        vRefNodeId: 11
      main:
        type: RemoteIntegerTimes10
        communicatorId: 4
        readOnly: true

  CabinetTemp:
    id: 25
    nodes:
      iomodule:
        type: ADCThermistor
        terminalId: 18
        pollingInterval: 30000
        vRefNodeId: 11
      main:
        type: RemoteIntegerTimes10
        communicatorId: 4
        readOnly: true

  LightsLargeBedroom:
    id: 33
    nodes:
      iomodule:
        type: PMW
        terminalId: 51
        storage: pwm
        key: large
      main:
        type: LightZoneRemoteSensor
        communicatorId: 4

  LightsBathroom:
    id: 30
    nodes:
      main:
        type: PWMLightZone
        terminalId: 24
        storage: pwm
        key: bathroom

  LightsSmallroom:
    id: 31
    nodes:
      main:
        type: PWMLightZone
        terminalId: 22
        storage: pwm
        key: smallroom

  LightsLivingroomLedstrip:
    id: 34
    nodes:
      iomodule:
        type: PMW
        terminalId: 49
        storage: pwm
        key: livingmood
      main:
        type: LightZoneRemoteSensor
        communicatorId: 4

  LightLivingroomTable:
    id: 35
    nodes:
      iomodule:
        type: PMW
        terminalId: 47
        storage: pwm
        key: living
      main:
        type: LightZoneRemoteSensor
        communicatorId: 4

  LightsOutside:
    id: 36
    nodes:
      iomodule:
        type: PMW
        terminalId: 45
        storage: pwm
        key: outside
      main:
        type: LightZoneRemoteSensor
        communicatorId: 4

  Outlets:
    id: 37
    nodes:
      iomodule:
        type: MxPulseRelay
        terminalId: 43
        stateNodeId: 107
      main:
        type: ControlledRemoteBoolean
        currentControllerId: 180
        communicatorId: 4

  OutletState:
    id: 107
    nodes:
      iomodule:
        type: MxDigital
        terminalId: 10
        interruptEnabled: true
        pollingInterval: 10300

  Boiler:
    id: 41
    nodes:
      iomodule:
        type: MxPulseRelay
        terminalId: 42
        stateNodeId: 111
      main:
        type: ControlledRemoteBoolean
        currentControllerId: 180
        communicatorId: 4

  BoilerState:
    id: 111
    nodes:
      iomodule:
        type: MxDigital
        terminalId: 11
        interruptEnabled: true
        pollingInterval: 10700

  HighpowerOutlet:
    id: 38
    nodes:
      iomodule:
        type: MxPulseRelay
        terminalId: 41
        stateNodeId: 108
      main:
        type: ControlledRemoteBoolean
        currentControllerId: 180
        communicatorId: 4

  HigpowerOutletState:
    id: 108
    nodes:
      iomodule:
        type: MxDigital
        terminalId: 12
        interruptEnabled: true
        pollingInterval: 10500

  HeaterLiving:
    id: 39
    nodes:
      iomodule:
        type: MxPulseRelay
        terminalId: 40
        stateNodeId: 109
      main:
        type: ControlledRemoteBoolean
        currentControllerId: 180
        communicatorId: 4

  HeaterLivingState:
    id: 109
    nodes:
      iomodule:
        type: MxDigital
        terminalId: 13
        interruptEnabled: true
        pollingInterval: 10600

  HeaterBathroom:
    id: 40
    nodes:
      iomodule:
        type: MxPulseRelay
        terminalId: 39
        stateNodeId: 110
      main:
        type: ControlledRemoteBoolean
        currentControllerId: 180
        communicatorId: 4

  HeaterBathroomState:
    id: 110
    nodes:
      iomodule:
        type: MxDigital
        terminalId: 14
        interruptEnabled: true
        pollingInterval: 10400

  Stove:
    id: 42
    nodes:
      iomodule:
        type: MxPulseRelay
        terminalId: 38
        stateNodeId: 112
      main:
        type: ControlledRemoteBoolean
        currentControllerId: 180
        communicatorId: 4

  StoveState:
    id: 112
    nodes:
      iomodule:
        type: MxDigital
        terminalId: 15
        interruptEnabled: true
        pollingInterval: 10800

  # Not present on testboard
  # Current:
  #   id: 43
  #   nodes:
  #     iomodule:
  #       type: ADCCurrent
  #       terminalId: 1
  #       pinMode: INPUT
  #       pollingInterval: 500
  #     main:
  #       type: RemoteIntegerTimes10
  #       readOnly: true
  #       communicatorId: 4

  # Not present on testboard
  # KwhCounter:
  #   id: 45
  #   nodes:
  #     iomodule:
  #       type: MxInterruptCount
  #       terminalId: 26
  #       pollingInterval: 10200
  #       divider: 100
  #       interruptMode: FALLING
  #       storage: intcount
  #       key: kwh
  #     main:
  #       type: StoredSettings<InterruptCounterType>
  #       storage: haaksnano
  #       key: powerusage

  WaterflowCounter:
    id: 50
    nodes:
      iomodule:
        type: InterruptCount
        terminalId: 3
        divider: 1
        debounceMs: 1000
        pollingInterval: 10900
        storage: intcount
        key: water
      main:
        type: StoredSettingsInterruptCounterType
        storage: haaksnano
        key: waterusage

  FlushRelais:
    id: 83
    nodes:
      iomodule:
        type: MxDigital
        terminalId: 37
        pinMode: OUTPUT
      main:
        type: RemoteBoolean
        communicatorId: 4

  SunUpdown:
    id: 21
    nodes:
      main:
        type: SunUpDown
        lightzoneIds:
          - 36
        pollingInterval: 60000

currentcontroller:
  id: 180
  main:
    maxCurrent: 25
    currentNodeId: 43
    currentNodeMode: PositiveLoad
    mainSwitchNode: 129
    mainSwitchReverseLogic: true
    pollingInterval: 10000
    nodes:
      - nodeId: 42
        prio: 100
      - nodeId: 37
        prio: 90
      - nodeId: 38
        prio: 80
      - nodeId: 182
        prio: 70
      - nodeId: 41
        prio: 60
      - nodeId: 181
        prio: 50

Bascy commented 1 year ago

I've added some logging of freeheap to the whole parsing process and this is what I found:

To load the test document from the deserialize.ino example, which contains 6 properties, total length 91 characters, at its peak (after calling yaml_parser_load_nodes in loader.c) the parser has reduced freeheap with 7216 bytes. The resulting json document is 127 bytes

To load my document (397 properties, 9277 bytes) at the peak the parser has reduced freeheap with 129420 bytes. The resulting Json document is 9020 bytes

I don't think this is ever going to work with large applications running on i.e. a ESP32 board ...

Or am I doing something wrong and is it possible to have the parser create the Json document in parallel with reading the yaml file, so it doesnt need to create a huge node tree in memory before calls deserializeYml_JsonObject() ?

Bascy commented 1 year ago

I created my own Builder imlementation that creates the json document without using soo much memory, so this issue can be closed

tobozo / YAMLDuino

Excessive memory use loading large yaml stream? #18