hyperledger-archives / sawtooth-core

Core repository for Sawtooth Distributed Ledger
https://wiki.hyperledger.org/display/sawtooth
Apache License 2.0
1.43k stars 763 forks source link

PoET not accepting any requests after getting to many requests error 429 #2384

Open wejdeneHaouari opened 3 years ago

wejdeneHaouari commented 3 years ago

1. Issue I am running a heavy workload on Sawtooth network for test purpose. When I run the network using PBFT or Raft consensus I have Too many requests error but the network continue to accept requests. However using PoET the network stop accepting any request after I got 429 error.

2. System information

This is the validator configuration for PoET

validator-0:
    image: hyperledger/sawtooth-validator:chime
    container_name: sawtooth-validator-default-0
    expose:
      - 4004
      - 5050
      - 8800
    volumes:
      - poet-shared:/poet-shared
    command: "bash -c \"\
        sawadm keygen --force && \
        mkdir -p /poet-shared/validator-0 || true && \
        cp -a /etc/sawtooth/keys /poet-shared/validator-0/ && \
        while [ ! -f /poet-shared/poet-enclave-measurement ]; do sleep 1; done && \
        while [ ! -f /poet-shared/poet-enclave-basename ]; do sleep 1; done && \
        while [ ! -f /poet-shared/poet.batch ]; do sleep 1; done && \
        cp /poet-shared/poet.batch / && \
        sawset genesis \
          -k /etc/sawtooth/keys/validator.priv \
          -o config-genesis.batch && \
        sawset proposal create \
          -k /etc/sawtooth/keys/validator.priv \
          sawtooth.consensus.algorithm.name=PoET \
          sawtooth.consensus.algorithm.version=0.1 \
          sawtooth.poet.report_public_key_pem=\
          \\\"$$(cat /poet-shared/simulator_rk_pub.pem)\\\" \
          sawtooth.poet.valid_enclave_measurements=$$(cat /poet-shared/poet-enclave-measurement) \
          sawtooth.poet.valid_enclave_basenames=$$(cat /poet-shared/poet-enclave-basename) \
          -o config.batch && \
        sawset proposal create \
          -k /etc/sawtooth/keys/validator.priv \
             sawtooth.poet.target_wait_time=5 \
             sawtooth.poet.initial_wait_time=25 \
             sawtooth.publisher.max_batches_per_block=100 \
          -o poet-settings.batch && \
        sawadm genesis \
          config-genesis.batch config.batch poet.batch poet-settings.batch && \
        sawtooth-validator -v \
          --bind network:tcp://eth0:8800 \
          --bind component:tcp://eth0:4004 \
          --bind consensus:tcp://eth0:5050 \
          --peering static \
          --endpoint tcp://validator-0:8800 \
          --scheduler parallel \
          --maximum-peer-connectivity 10000
    \""
    environment:
      PYTHONPATH: "/project/sawtooth-core/consensus/poet/common:\
        /project/sawtooth-core/consensus/poet/simulator:\
        /project/sawtooth-core/consensus/poet/core"
    stop_signal: SIGKILL

I am running the sawtooth network on a AWS VM, with 8GO of RAM and 2CPU running ubuntu 18.04

2. Question Any idea how to solve this issue? Is there a way to disable back pressure test ?

rowaisi commented 3 years ago

Hi-

There is a problem with PoET, after we reach 10 concurrent users, it stops working and it says too many requests error. This usually happens after 5 minutes of working. This might be due to the backpressure module in the sawtooth. But the point is that we have the same backpressure for both RAFT and PBFT. We have only the issue when we use PoET consensus. So generally, PoET is useless because we keep getting too many request errors and the system crashes. But for both PBFT and Raft and it keeps working.

rowaisi commented 3 years ago

Hi @agunde406 @vaporos and @rbuysse ,

I would need your help as I am going to use Hyperledger Sawtooth for a big project for our company. We have the issue with PoET but neither for PBFT nor Raft. After 5 to 10 min maximum, when the users start sending more requests and sending transactions. The system crashes for PoET with the error of 429 error. The issue is the whole system crashes. We have the error with both PBFT and raft but after the users reduce the error will go. But for PoET, it never continues working. Can you please check this bug with the PoET? In this case we have to ignore using of PoET and stick with either Raft or PBFT but we prefer to use PoET as it is a large scale project in Canada.

I appreciate your kind consideration on this matter and looking forward hearing form you

peterschwarz commented 3 years ago

What is the transaction rate that you are submitting against your validators? Are you spreading them across the network or firing them at a single node? How many blocks deep are you when this occurs?

wejdeneHaouari commented 3 years ago
  1. The transaction rate reaches a maximum of 100 transactions per 10 seconds . This is a graph showing the transaction rate and the error rate. After 450 seconds the network stop working. issue

  2. We are using the default PoET network with 5 validators and we are spreading the transaction randomly between them. Please find bellow the complete docker compose file.

  3. Only 33 blocks are created before this error occurs.

Thank you in advance @peterschwarz @agunde406 @vaporos and @rbuysse


version: "2.1"

volumes:
  poet-shared:

services:
  shell:
    image: hyperledger/sawtooth-shell:chime
    container_name: sawtooth-shell-default
    entrypoint: "bash -c \"\
        sawtooth keygen && \
        tail -f /dev/null \
        \""

  validator-0:
    image: hyperledger/sawtooth-validator:chime
    container_name: sawtooth-validator-default-0
    expose:
      - 4004
      - 5050
      - 8800
    volumes:
      - poet-shared:/poet-shared
    command: "bash -c \"\
        sawadm keygen --force && \
        mkdir -p /poet-shared/validator-0 || true && \
        cp -a /etc/sawtooth/keys /poet-shared/validator-0/ && \
        while [ ! -f /poet-shared/poet-enclave-measurement ]; do sleep 1; done && \
        while [ ! -f /poet-shared/poet-enclave-basename ]; do sleep 1; done && \
        while [ ! -f /poet-shared/poet.batch ]; do sleep 1; done && \
        cp /poet-shared/poet.batch / && \
        sawset genesis \
          -k /etc/sawtooth/keys/validator.priv \
          -o config-genesis.batch && \
        sawset proposal create \
          -k /etc/sawtooth/keys/validator.priv \
          sawtooth.consensus.algorithm.name=PoET \
          sawtooth.consensus.algorithm.version=0.1 \
          sawtooth.poet.report_public_key_pem=\
          \\\"$$(cat /poet-shared/simulator_rk_pub.pem)\\\" \
          sawtooth.poet.valid_enclave_measurements=$$(cat /poet-shared/poet-enclave-measurement) \
          sawtooth.poet.valid_enclave_basenames=$$(cat /poet-shared/poet-enclave-basename) \
          -o config.batch && \
        sawset proposal create \
          -k /etc/sawtooth/keys/validator.priv \
             sawtooth.poet.target_wait_time=5 \
             sawtooth.poet.initial_wait_time=25 \
             sawtooth.publisher.max_batches_per_block=100 \
          -o poet-settings.batch && \
        sawadm genesis \
          config-genesis.batch config.batch poet.batch poet-settings.batch && \
        sawtooth-validator -v \
          --bind network:tcp://eth0:8800 \
          --bind component:tcp://eth0:4004 \
          --bind consensus:tcp://eth0:5050 \
          --peering static \
          --endpoint tcp://validator-0:8800 \
          --scheduler parallel \
          --network-auth trust
    \""
    environment:
      PYTHONPATH: "/project/sawtooth-core/consensus/poet/common:\
        /project/sawtooth-core/consensus/poet/simulator:\
        /project/sawtooth-core/consensus/poet/core"
    stop_signal: SIGKILL

  validator-1:
    image: hyperledger/sawtooth-validator:chime
    container_name: sawtooth-validator-default-1
    expose:
      - 4004
      - 5050
      - 8800
    volumes:
      - poet-shared:/poet-shared
    command: |
      bash -c "
        sawadm keygen --force && \
        mkdir -p /poet-shared/validator-1 || true && \
        cp -a /etc/sawtooth/keys /poet-shared/validator-1/ && \
        sawtooth-validator -v \
            --bind network:tcp://eth0:8800 \
            --bind component:tcp://eth0:4004 \
            --bind consensus:tcp://eth0:5050 \
            --peering static \
            --endpoint tcp://validator-1:8800 \
            --peers tcp://validator-0:8800 \
            --scheduler parallel \
            --network-auth trust
      "
    environment:
      PYTHONPATH: "/project/sawtooth-core/consensus/poet/common:\
        /project/sawtooth-core/consensus/poet/simulator:\
        /project/sawtooth-core/consensus/poet/core"
    stop_signal: SIGKILL

  validator-2:
    image: hyperledger/sawtooth-validator:chime
    container_name: sawtooth-validator-default-2
    expose:
      - 4004
      - 5050
      - 8800
    volumes:
      - poet-shared:/poet-shared
    command: |
      bash -c "
        sawadm keygen --force && \
        mkdir -p /poet-shared/validator-2 && \
        cp -a /etc/sawtooth/keys /poet-shared/validator-2/ && \
        sawtooth-validator -v \
            --bind network:tcp://eth0:8800 \
            --bind component:tcp://eth0:4004 \
            --bind consensus:tcp://eth0:5050 \
            --peering static \
            --endpoint tcp://validator-2:8800 \
            --peers tcp://validator-0:8800,tcp://validator-1:8800 \
            --scheduler parallel \
            --network-auth trust
      "
    environment:
      PYTHONPATH: "/project/sawtooth-core/consensus/poet/common:\
        /project/sawtooth-core/consensus/poet/simulator:\
        /project/sawtooth-core/consensus/poet/core"
    stop_signal: SIGKILL

  validator-3:
    image: hyperledger/sawtooth-validator:chime
    container_name: sawtooth-validator-default-3
    expose:
      - 4004
      - 5050
      - 8800
    volumes:
      - poet-shared:/poet-shared
    command: |
      bash -c "
        sawadm keygen --force && \
        mkdir -p /poet-shared/validator-3 && \
        cp -a /etc/sawtooth/keys /poet-shared/validator-3/ && \
        sawtooth-validator -v \
            --bind network:tcp://eth0:8800 \
            --bind component:tcp://eth0:4004 \
            --bind consensus:tcp://eth0:5050 \
            --peering static \
            --endpoint tcp://validator-3:8800 \
            --peers tcp://validator-0:8800,tcp://validator-1:8800,tcp://validator-2:8800 \
            --scheduler parallel \
            --network-auth trust
      "
    environment:
      PYTHONPATH: "/project/sawtooth-core/consensus/poet/common:\
        /project/sawtooth-core/consensus/poet/simulator:\
        /project/sawtooth-core/consensus/poet/core"
    stop_signal: SIGKILL

  validator-4:
    image: hyperledger/sawtooth-validator:chime
    container_name: sawtooth-validator-default-4
    expose:
      - 4004
      - 5050
      - 8800
    volumes:
      - poet-shared:/poet-shared
    command: |
      bash -c "
        sawadm keygen --force && \
        mkdir -p /poet-shared/validator-4 && \
        cp -a /etc/sawtooth/keys /poet-shared/validator-4/ && \
        sawtooth-validator -v \
            --bind network:tcp://eth0:8800 \
            --bind component:tcp://eth0:4004 \
            --bind consensus:tcp://eth0:5050 \
            --peering static \
            --endpoint tcp://validator-4:8800 \
            --peers tcp://validator-0:8800,tcp://validator-1:8800,tcp://validator-2:8800,tcp://validator-3:8800 \
            --scheduler parallel \
            --network-auth trust
      "
    environment:
      PYTHONPATH: "/project/sawtooth-core/consensus/poet/common:\
        /project/sawtooth-core/consensus/poet/simulator:\
        /project/sawtooth-core/consensus/poet/core"
    stop_signal: SIGKILL

  rest-api-0:
    image: hyperledger/sawtooth-rest-api:chime
    container_name: sawtooth-rest-api-default-0
    expose:
      - 8008
    command: |
      bash -c "
        sawtooth-rest-api \
          --connect tcp://validator-0:4004 \
          --bind rest-api-0:8008
      "
    stop_signal: SIGKILL

  rest-api-1:
    image: hyperledger/sawtooth-rest-api:chime
    container_name: sawtooth-rest-api-default-1
    expose:
      - 8008
    command: |
      bash -c "
        sawtooth-rest-api \
          --connect tcp://validator-1:4004 \
          --bind rest-api-1:8008
      "
    stop_signal: SIGKILL

  rest-api-2:
    image: hyperledger/sawtooth-rest-api:chime
    container_name: sawtooth-rest-api-default-2
    expose:
      - 8008
    command: |
      bash -c "
        sawtooth-rest-api \
          --connect tcp://validator-2:4004 \
          --bind rest-api-2:8008
      "
    stop_signal: SIGKILL

  rest-api-3:
    image: hyperledger/sawtooth-rest-api:chime
    container_name: sawtooth-rest-api-default-3
    expose:
      - 8008
    command: |
      bash -c "
        sawtooth-rest-api \
          --connect tcp://validator-3:4004 \
          --bind rest-api-3:8008
      "
    stop_signal: SIGKILL

  rest-api-4:
    image: hyperledger/sawtooth-rest-api:chime
    container_name: sawtooth-rest-api-default-4
    expose:
      - 8008
    command: |
      bash -c "
        sawtooth-rest-api \
          --connect tcp://validator-4:4004 \
          --bind rest-api-4:8008
      "
    stop_signal: SIGKILL

  kvstore-processor-0:
    build:
      context: .
      dockerfile: kvstoreprocessor/DockerFile
    depends_on:
      - validator-0
    container_name: kvstore-processor-0
    volumes:
      - ./:/project/sawtooth-kvstore
    command: |
      bash -c "
       chmod +x /project/sawtooth-kvstore/bin/build_kvstore
       ../bin/build_kvstore
          chmod +x /project/sawtooth-kvstore/bin/kvstore
       ../bin/kvstore tcp://validator-0:4004
       "

  kvstore-processor-1:
    build:
      context: .
      dockerfile: kvstoreprocessor/DockerFile
    depends_on:
      - validator-1
    container_name: kvstore-processor-1
    volumes:
      - ./:/project/sawtooth-kvstore
    command: |
      bash -c "
       chmod +x /project/sawtooth-kvstore/bin/build_kvstore
       ../bin/build_kvstore
          chmod +x /project/sawtooth-kvstore/bin/kvstore
       ../bin/kvstore tcp://validator-1:4004
       "

  kvstore-processor-2:
    build:
      context: .
      dockerfile: kvstoreprocessor/DockerFile
    depends_on:
      - validator-2
    container_name: kvstore-processor-2
    volumes:
      - ./:/project/sawtooth-kvstore
    command: |
      bash -c "
       chmod +x /project/sawtooth-kvstore/bin/build_kvstore
       ../bin/build_kvstore
          chmod +x /project/sawtooth-kvstore/bin/kvstore
       ../bin/kvstore tcp://validator-2:4004
       "

  kvstore-processor-3:
    build:
      context: .
      dockerfile: kvstoreprocessor/DockerFile
    depends_on:
      - validator-1
    container_name: kvstore-processor-3
    volumes:
      - ./:/project/sawtooth-kvstore
    command: |
      bash -c "
       chmod +x /project/sawtooth-kvstore/bin/build_kvstore
       ../bin/build_kvstore
          chmod +x /project/sawtooth-kvstore/bin/kvstore
       ../bin/kvstore tcp://validator-3:4004
       "

  kvstore-processor-4:
    build:
      context: .
      dockerfile: kvstoreprocessor/DockerFile
    depends_on:
      - validator-1
    container_name: kvstore-processor-4
    volumes:
      - ./:/project/sawtooth-kvstore
    command: |
      bash -c "
       chmod +x /project/sawtooth-kvstore/bin/build_kvstore
       ../bin/build_kvstore
          chmod +x /project/sawtooth-kvstore/bin/kvstore
       ../bin/kvstore tcp://validator-4:4004
       "

  settings-tp-0:
    image: hyperledger/sawtooth-settings-tp:chime
    container_name: sawtooth-settings-tp-default-0
    expose:
      - 4004
    command: settings-tp -v -C tcp://validator-0:4004
    stop_signal: SIGKILL

  settings-tp-1:
    image: hyperledger/sawtooth-settings-tp:chime
    container_name: sawtooth-settings-tp-default-1
    expose:
      - 4004
    command: settings-tp -v -C tcp://validator-1:4004
    stop_signal: SIGKILL

  settings-tp-2:
    image: hyperledger/sawtooth-settings-tp:chime
    container_name: sawtooth-settings-tp-default-2
    expose:
      - 4004
    command: settings-tp -v -C tcp://validator-2:4004
    stop_signal: SIGKILL

  settings-tp-3:
    image: hyperledger/sawtooth-settings-tp:chime
    container_name: sawtooth-settings-tp-default-3
    expose:
      - 4004
    command: settings-tp -v -C tcp://validator-3:4004
    stop_signal: SIGKILL

  settings-tp-4:
    image: hyperledger/sawtooth-settings-tp:chime
    container_name: sawtooth-settings-tp-default-4
    expose:
      - 4004
    command: settings-tp -v -C tcp://validator-4:4004
    stop_signal: SIGKILL

  poet-engine-0:
    image: hyperledger/sawtooth-poet-engine:chime
    container_name: sawtooth-poet-engine-0
    volumes:
      - poet-shared:/poet-shared
    command: "bash -c \"\
        if [ ! -f /poet-shared/poet-enclave-measurement ]; then \
            poet enclave measurement >> /poet-shared/poet-enclave-measurement; \
        fi && \
        if [ ! -f /poet-shared/poet-enclave-basename ]; then \
            poet enclave basename >> /poet-shared/poet-enclave-basename; \
        fi && \
        if [ ! -f /poet-shared/simulator_rk_pub.pem ]; then \
            cp /etc/sawtooth/simulator_rk_pub.pem /poet-shared; \
        fi && \
        while [ ! -f /poet-shared/validator-0/keys/validator.priv ]; do sleep 1; done && \
        cp -a /poet-shared/validator-0/keys /etc/sawtooth && \
        poet registration create -k /etc/sawtooth/keys/validator.priv -o /poet-shared/poet.batch && \
        poet-engine -C tcp://validator-0:5050 --component tcp://validator-0:4004 \
    \""

  poet-engine-1:
    image: hyperledger/sawtooth-poet-engine:chime
    container_name: sawtooth-poet-engine-1
    volumes:
      - poet-shared:/poet-shared
    command: "bash -c \"\
        while [ ! -f /poet-shared/validator-1/keys/validator.priv ]; do sleep 1; done && \
        cp -a /poet-shared/validator-1/keys /etc/sawtooth && \
        poet-engine -C tcp://validator-1:5050 --component tcp://validator-1:4004 \
    \""

  poet-engine-2:
    image: hyperledger/sawtooth-poet-engine:chime
    container_name: sawtooth-poet-engine-2
    volumes:
      - poet-shared:/poet-shared
    command: "bash -c \"\
        while [ ! -f /poet-shared/validator-2/keys/validator.priv ]; do sleep 1; done && \
        cp -a /poet-shared/validator-2/keys /etc/sawtooth && \
        poet-engine -C tcp://validator-2:5050 --component tcp://validator-2:4004 \
    \""

  poet-engine-3:
    image: hyperledger/sawtooth-poet-engine:chime
    container_name: sawtooth-poet-engine-3
    volumes:
      - poet-shared:/poet-shared
    command: "bash -c \"\
        while [ ! -f /poet-shared/validator-3/keys/validator.priv ]; do sleep 1; done && \
        cp -a /poet-shared/validator-3/keys /etc/sawtooth && \
        poet-engine -C tcp://validator-3:5050 --component tcp://validator-3:4004 \
    \""

  poet-engine-4:
    image: hyperledger/sawtooth-poet-engine:chime
    container_name: sawtooth-poet-engine-4
    volumes:
      - poet-shared:/poet-shared
    command: "bash -c \"\
        while [ ! -f /poet-shared/validator-4/keys/validator.priv ]; do sleep 1; done && \
        cp -a /poet-shared/validator-4/keys /etc/sawtooth && \
        poet-engine -C tcp://validator-4:5050 --component tcp://validator-4:4004 \
    \""

  poet-validator-registry-tp-0:
    image: hyperledger/sawtooth-poet-validator-registry-tp:chime
    container_name: sawtooth-poet-validator-registry-tp-0
    expose:
      - 4004
    command: poet-validator-registry-tp -C tcp://validator-0:4004
    environment:
      PYTHONPATH: /project/sawtooth-core/consensus/poet/common
    stop_signal: SIGKILL

  poet-validator-registry-tp-1:
    image: hyperledger/sawtooth-poet-validator-registry-tp:chime
    container_name: sawtooth-poet-validator-registry-tp-1
    expose:
      - 4004
    command: poet-validator-registry-tp -C tcp://validator-1:4004
    environment:
      PYTHONPATH: /project/sawtooth-core/consensus/poet/common
    stop_signal: SIGKILL

  poet-validator-registry-tp-2:
    image: hyperledger/sawtooth-poet-validator-registry-tp:chime
    container_name: sawtooth-poet-validator-registry-tp-2
    expose:
      - 4004
    command: poet-validator-registry-tp -C tcp://validator-2:4004
    environment:
      PYTHONPATH: /project/sawtooth-core/consensus/poet/common
    stop_signal: SIGKILL

  poet-validator-registry-tp-3:
    image: hyperledger/sawtooth-poet-validator-registry-tp:chime
    container_name: sawtooth-poet-validator-registry-tp-3
    expose:
      - 4004
    command: poet-validator-registry-tp -C tcp://validator-3:4004
    environment:
      PYTHONPATH: /project/sawtooth-core/consensus/poet/common
    stop_signal: SIGKILL

  poet-validator-registry-tp-4:
    image: hyperledger/sawtooth-poet-validator-registry-tp:chime
    container_name: sawtooth-poet-validator-registry-tp-4
    expose:
      - 4004
    command: poet-validator-registry-tp -C tcp://validator-4:4004
    environment:
      PYTHONPATH: /project/sawtooth-core/consensus/poet/common
    stop_signal: SIGKILL

    # --------------- block server subscriber & transaction server ----------------#
  intkey-rest-api:
    build:
      context: .
      dockerfile: rest_api/Dockerfile
    image: intkey-rest-api
    container_name: intkey-rest-api
    volumes:
      - ./:/project/sawtooth_sdk_python
    ports:
      - '3000:8000'
    command: |
      bash -c "
      chmod +x /project/sawtooth_sdk_python/bin/rest-api
      rest-api \
          -b intkey-rest-api:8000 \
          --keyfile /root/.sawtooth/keys/root.priv \
          --url rest-api-0:8008
      "
  block-server-subscriber:
    build:
      context: .
      dockerfile: block_server_subscriber/Dockerfile
    image: block-server-subscriber
    container_name: block-server-subscriber
    volumes:
      - ./:/project/sawtooth_sdk_python
    ports:
      - '9002:9002'
    depends_on:
      - validator-0
    command: |
      sh -c "
      chmod +x /project/sawtooth_sdk_python/bin/block-server-subscriber
      block-server-subscriber \
          -C tcp://validator-0:4004 \
          --url rest-api-0:8008 \
          --uri  mongodb://root:password@bb:27017/ \
          -vv
      "
  block-server-rest-api:
    build:
      context: .
      dockerfile: block_server_api/Dockerfile
    image: block-server-rest-api
    container_name: block-server-rest-api
    volumes:
      - ./:/project/sawtooth_sdk_python
    ports:
      - '9001:9001'
    command: |
      sh -c "
      chmod +x /project/sawtooth_sdk_python/bin/block-server-api
      block-server-api \
           -b block-server-rest-api:9001 \
           --uri  mongodb://root:password@bb:27017/ \
           -vv
      "
    # -------------database for off chain data --------
  bb:
    image: mongo:3-xenial
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: password
    restart: always
    ports:
      - 27018:27017

  bbmanager:
    image: mongo-express
    links:
      - bb
    ports:
      - 27019:8081
    restart: always
    environment:
      ME_CONFIG_MONGODB_SERVER: bb
      ME_CONFIG_MONGODB_ADMINUSERNAME: root
      ME_CONFIG_MONGODB_ADMINPASSWORD: password
rowaisi commented 3 years ago

Dear @arsulegai,

could you please check this bug above? We have no problem with either Raft or PBFT in Hyperledger Sawtooth. But we have issue that after 5-10 mins when we start sending transactions, with 5 nodes, the system starts crashes with the error 429 too many requests. Do you have any suggestions? I assume that you work with this repository and you might have experienced some idea to cope with such a problem.

Thanks for your kind consideration.

arsulegai commented 3 years ago

@rowaisi it was long ago, but sure. I need logs from the PoET engine and the Validator to analyze in detail.

Following is a possibility, only logs can confirm

The tag sawtooth-poet-engine:chime on docker hub appears to be from 2 years ago, I am not sure if it has all the fixes. I have a debugger tool to check if state machine has corrupted. Please run the tool https://github.com/arsulegai/state-checker on the PoET engine logs. Run the tool against the configuration in https://github.com/arsulegai/state-checker#use-case-1-simple-state-transition-with-pattern-matching. See if you observe any corruption, this tool was initially written to analyze the state corruption in PoET logs.

I observed one such behavior as you are facing, it was long back and https://github.com/hyperledger/sawtooth-poet/commit/754766d35fe24837d68afe9e848568b26fb1065b was the fix for that issue.

wejdeneHaouari commented 3 years ago

Hello @arsulegai @peterschwarz @agunde406 @vaporos and @rbuysse,

this is the logs of sawtooth-validator-default-0 container when I got the error, same thing for the other validators


[2021-07-21 16:58:46.699 INFO     (unknown file)] [src/journal/publisher.rs: 172] Now building on top of block, Block(id: c41a8111a9887a21f0008dc6ced51ce2cf710d3cc65a8defe7ce7bdbb5941afc65d3b3e40b94323d1ad7cd14497116be500cf2e6165c17a9c492d7440ab7c1f0, block_num: 48, state_root_hash: 86159582893d41fa3d7896bd07372bfd8e8dfe77f51f858656166b1c17c9ebbc, previous_block_id: 83419debbcc8ffba1459127ef0552c32fe63f3e4d16bcd95f6393d02b39ea9c10fe417e9938981860529e0c6fd9cce708de22a9ed0343847d6e75e12b1bd0495)
[2021-07-21 16:58:56.666 INFO     (unknown file)] [src/journal/block_validator.rs: 265] Block 30de5cad39a23d0701319c0e42deeb3241f4246115c2d688a894e7911aff46052a5dba1695f07ca9ace43805b5117cbf661025c575541e80074c7cdb2e1bb634 passed validation
[2021-07-21 16:58:56.804 INFO     (unknown file)] [src/journal/chain.rs: 206] Building fork resolution for chain head 'Block(id: c41a8111a9887a21f0008dc6ced51ce2cf710d3cc65a8defe7ce7bdbb5941afc65d3b3e40b94323d1ad7cd14497116be500cf2e6165c17a9c492d7440ab7c1f0, block_num: 48, state_root_hash: 86159582893d41fa3d7896bd07372bfd8e8dfe77f51f858656166b1c17c9ebbc, previous_block_id: 83419debbcc8ffba1459127ef0552c32fe63f3e4d16bcd95f6393d02b39ea9c10fe417e9938981860529e0c6fd9cce708de22a9ed0343847d6e75e12b1bd0495)' against new block 'Block(id: 30de5cad39a23d0701319c0e42deeb3241f4246115c2d688a894e7911aff46052a5dba1695f07ca9ace43805b5117cbf661025c575541e80074c7cdb2e1bb634, block_num: 49, state_root_hash: 22183ae451264b6c6d6a0897f9ecba4966ad2a9859941406a190ffd909353a87, previous_block_id: c41a8111a9887a21f0008dc6ced51ce2cf710d3cc65a8defe7ce7bdbb5941afc65d3b3e40b94323d1ad7cd14497116be500cf2e6165c17a9c492d7440ab7c1f0)'
[2021-07-21 16:58:56.810 INFO     (unknown file)] [src/journal/chain.rs: 791] Chain head updated to Block(id: 30de5cad39a23d0701319c0e42deeb3241f4246115c2d688a894e7911aff46052a5dba1695f07ca9ace43805b5117cbf661025c575541e80074c7cdb2e1bb634, block_num: 49, state_root_hash: 22183ae451264b6c6d6a0897f9ecba4966ad2a9859941406a190ffd909353a87, previous_block_id: c41a8111a9887a21f0008dc6ced51ce2cf710d3cc65a8defe7ce7bdbb5941afc65d3b3e40b94323d1ad7cd14497116be500cf2e6165c17a9c492d7440ab7c1f0)
[2021-07-21 16:58:56.820 INFO     (unknown file)] [src/journal/publisher.rs: 172] Now building on top of block, Block(id: 30de5cad39a23d0701319c0e42deeb3241f4246115c2d688a894e7911aff46052a5dba1695f07ca9ace43805b5117cbf661025c575541e80074c7cdb2e1bb634, block_num: 49, state_root_hash: 22183ae451264b6c6d6a0897f9ecba4966ad2a9859941406a190ffd909353a87, previous_block_id: c41a8111a9887a21f0008dc6ced51ce2cf710d3cc65a8defe7ce7bdbb5941afc65d3b3e40b94323d1ad7cd14497116be500cf2e6165c17a9c492d7440ab7c1f0)
[2021-07-21 16:59:25.719 WARNING  notifier] Consensus notification CONSENSUS_NOTIFY_BLOCK_NEW timed out
[2021-07-21 16:59:35.720 WARNING  notifier] Consensus notification CONSENSUS_NOTIFY_BLOCK_NEW timed out
[2021-07-21 16:59:45.722 WARNING  notifier] Consensus notification CONSENSUS_NOTIFY_BLOCK_NEW timed out
[2021-07-21 16:59:55.723 WARNING  notifier] Consensus notification CONSENSUS_NOTIFY_BLOCK_NEW timed out
[2021-07-21 17:00:05.725 WARNING  notifier] Consensus notification CONSENSUS_NOTIFY_BLOCK_NEW timed out
[2021-07-21 17:00:10.431 INFO     back_pressure_handlers] Applying back pressure on client submitted batches: current depth: 421, limit: 420
rowaisi commented 3 years ago

Dear Members: @arsulegai @vaporos @peterschwarz @buysse and @isabeltomb,

We are very interested in to use Hypereldger Sawtooth for Canadian SupplyChain project and the issue with PoET is stopping us. I will be glad if someone can run the image of PoET and explore the issue. This backpressure causes PoET to crash and it does not work after we start using it with more users in 5 - 10 min max. This issue does not happen on both Raft and PBFT in Sawtooth.

We would be very much appreciate if somebody can take a look on logs and images.

Your consideration is highly appreciated.

agunde406 commented 3 years ago

Hello, I was not able to replicate this issue locally.

The commit that @arsulegai mentioned above has not been released but should be available in the nightly build.

I would suggest the following:

  1. Try with the PoET nightly image. This can be done just by swapping in hyperledger/sawtooth-poet-engine:nightly and hyperledger/sawtooth-poet-validator-registry-tp:nightly for the chime equivalent images. If that seems to fix your issues we can work towards getting a new release out.
  2. If not, you may need to double check that you're not finding a hash-mismatch somewhere. This could be caused by some kind of non-determinism issue in your transaction processor.

Any other errors that show up would also be helpful in figuring out what may be wrong, especially from the poet-engine. For example when you say crash is there an error message?

rowaisi commented 3 years ago

Dear @vaporos,

could you please help us regarding the specified issue. Please kindly look at the images that we use to run our Sawtooth network with PoET and after 5-10 of running network and sending transactions, the system is saturated with too many requests errors and even by reducing the transactions it wont be recovered. This issue does not exist in both PBFT and Raft.

I hope that you can guide us and let us know an option to see how can we surpass this issue.