saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.1k stars 5.47k forks source link

Fileclient - Data transport is broken #48265

Closed chewborg closed 4 years ago

chewborg commented 6 years ago

Description of Issue/Question

Using the salt file.managed state to copy a particular file that is about 400MB, Tab delimited, for a SOLR DB import, the copy is failing and only about 40MB is copied over. This has only occurred since upgrading from 2018.3.0 to 2018.3.1 on the both master and minion.

In the logs we can see this event on the file:

2018-06-22 10:12:09,956 [salt.state       :1789][INFO    ][2649] Running state [/var/solr/data/input/DATA_for_SOLR_headless.csv] at time 10:12:09.956380
2018-06-22 10:12:09,956 [salt.state       :1822][INFO    ][2649] Executing state file.managed for [/var/solr/data/input/DATA_for_SOLR_headless.csv]
2018-06-22 10:12:23,800 [salt.fileclient  :1242][WARNING ][2649] Data transport is broken, got: , type: bytes, exception: byte indices must be integers, not str, attempt 1 of 3
2018-06-22 10:12:23,993 [salt.fileclient  :1242][WARNING ][2649] Data transport is broken, got: , type: bytes, exception: byte indices must be integers, not str, attempt 2 of 3
2018-06-22 10:12:24,184 [salt.fileclient  :1242][WARNING ][2649] Data transport is broken, got: , type: bytes, exception: byte indices must be integers, not str, attempt 3 of 3
2018-06-22 10:12:24,376 [salt.fileclient  :1242][WARNING ][2649] Data transport is broken, got: , type: bytes, exception: byte indices must be integers, not str, attempt 4 of 3
2018-06-22 10:12:24,379 [salt.fileclient  :1249][ERROR   ][2649] Data transport is broken, got: , type: bytes, exception: byte indices must be integers, not str, retry attempts exhausted
2018-06-22 10:12:24,413 [salt.fileclient  :1257][INFO    ][2649] Fetching file from saltenv 'base', ** done ** 'bss/common/dev_solr_data/DATA_for_SOLR_headless.csv'
2018-06-22 10:12:24,703 [salt.loaded.int.module.cmdmod:392 ][INFO    ][2649] Executing command ['lsattr', '/var/solr/data/input/DATA_for_SOLR_headless.csv'] in directory '/root'
2018-06-22 10:12:24,709 [salt.state       :308 ][INFO    ][2649] File /var/solr/data/input/DATA_for_SOLR_headless.csv is in the correct state

Setup

(Please provide relevant configs and/or SLS files (Be sure to remove sensitive info).) sls:

copy_solr_data:
  file.managed:
    - name: /var/solr/data/input/DATA_for_SOLR_headless.csv
    - source: salt://bss/common/dev_solr_data/DATA_for_SOLR_headless.csv
    - user: solr
    - group: solr
    - show_changes: false
    - mode: 660
    - replace: True'

master config:

'state_verbose: False
state_output: mixed
timeout: 120
gather_job_timeout: 120
ping_on_rotate: True
worker_threads: 20

fileserver_backend:
  - git
  - roots

file_roots:
  base:
    - /srv/salt/states
    - /srv/salt/files

  dev:
    - /srv/salt/states/dev
    - /srv/salt/files

  preprod:
    - /srv/salt/states/preprod
    - /srv/salt/files

  prod:
    - /srv/salt/states/prod
    - /srv/salt/files

  management:
    - /srv/salt/states/management
    - /srv/salt/files

top_file_merging_strategy: same

pillar_roots:
  base:
    - /srv/salt/pillar

log_level: info
log_level_logfile: info

Steps to Reproduce Issue

(Include debug logs if possible and relevant.) run the salt state on the cli 'sudo salt 'svr-bss1-dev*' state.apply common.dev_solr_data'

With logging to file set to Debug. Master:

2018-06-22 11:34:42,434 [salt.master      :1783][ERROR   ][16232] Error in function _serve_file:
Traceback (most recent call last):
  File "/usr/lib/python3.4/site-packages/salt/master.py", line 1776, in run_func
    ret = getattr(self, func)(load)
  File "/usr/lib/python3.4/site-packages/salt/fileserver/__init__.py", line 613, in serve_file
    return self.servers[fstr](load, fnd)
  File "/usr/lib/python3.4/site-packages/salt/fileserver/roots.py", line 133, in serve_file
    data = data.decode(__salt_system_encoding__)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 1048575: unexpected end of data
2018-06-22 11:34:42,622 [salt.utils.lazy  :100 ][DEBUG   ][16222] LazyLoaded gitfs.envs
2018-06-22 11:34:42,624 [salt.master      :1783][ERROR   ][16222] Error in function _serve_file:
Traceback (most recent call last):
  File "/usr/lib/python3.4/site-packages/salt/master.py", line 1776, in run_func
    ret = getattr(self, func)(load)
  File "/usr/lib/python3.4/site-packages/salt/fileserver/__init__.py", line 613, in serve_file
    return self.servers[fstr](load, fnd)
  File "/usr/lib/python3.4/site-packages/salt/fileserver/roots.py", line 133, in serve_file
    data = data.decode(__salt_system_encoding__)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 1048575: unexpected end of data
2018-06-22 11:34:42,811 [salt.utils.lazy  :100 ][DEBUG   ][16193] LazyLoaded gitfs.envs
2018-06-22 11:34:42,813 [salt.master      :1783][ERROR   ][16193] Error in function _serve_file:
Traceback (most recent call last):
  File "/usr/lib/python3.4/site-packages/salt/master.py", line 1776, in run_func
    ret = getattr(self, func)(load)
  File "/usr/lib/python3.4/site-packages/salt/fileserver/__init__.py", line 613, in serve_file
    return self.servers[fstr](load, fnd)
  File "/usr/lib/python3.4/site-packages/salt/fileserver/roots.py", line 133, in serve_file
    data = data.decode(__salt_system_encoding__)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 1048575: unexpected end of data
2018-06-22 11:34:43,000 [salt.utils.lazy  :100 ][DEBUG   ][16229] LazyLoaded gitfs.envs
2018-06-22 11:34:43,002 [salt.master      :1783][ERROR   ][16229] Error in function _serve_file:
Traceback (most recent call last):
  File "/usr/lib/python3.4/site-packages/salt/master.py", line 1776, in run_func
    ret = getattr(self, func)(load)
  File "/usr/lib/python3.4/site-packages/salt/fileserver/__init__.py", line 613, in serve_file
    return self.servers[fstr](load, fnd)
  File "/usr/lib/python3.4/site-packages/salt/fileserver/roots.py", line 133, in serve_file
    data = data.decode(__salt_system_encoding__)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 1048575: unexpected end of data

From Minion:

2018-06-22 10:34:27,735 [salt.transport.zeromq:128 ][DEBUG   ][3756] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'svr-bss1-dev.domain.ie', 'tcp://10.124.1.30:4506', 'aes')
2018-06-22 10:34:27,736 [salt.crypt       :463 ][DEBUG   ][3756] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'svr-bss1-dev.domain.ie', 'tcp://10.124.1.30:4506')
2018-06-22 10:34:27,737 [salt.transport.zeromq:194 ][DEBUG   ][3756] Connecting the Minion to the Master URI (for the return server): tcp://10.124.1.30:4506
2018-06-22 10:34:27,737 [salt.transport.zeromq:1029][DEBUG   ][3756] Trying to connect to: tcp://10.124.1.30:4506
2018-06-22 10:34:28,036 [salt.fileclient  :1124][DEBUG   ][3756] In saltenv 'base', looking at rel_path 'bss/common/dev_solr_data/DATA_for_SOLR_headless.csv' to resolve 'salt://bss/common/dev_solr_data/DATA_for_SOLR_headless.csv'
2018-06-22 10:34:28,036 [salt.fileclient  :1132][DEBUG   ][3756] In saltenv 'base', ** considering ** path '/var/cache/salt/minion/files/base/bss/common/dev_solr_data/DATA_for_SOLR_headless.csv' to resolve 'salt://bss/common/dev_solr_data/DATA_for_SOLR_headless.csv'
2018-06-22 10:34:28,174 [salt.fileclient  :1152][DEBUG   ][3756] Fetching file from saltenv 'base', ** attempting ** 'salt://bss/common/dev_solr_data/DATA_for_SOLR_headless.csv'
2018-06-22 10:34:28,174 [salt.fileclient  :1176][DEBUG   ][3756] No dest file found
2018-06-22 10:34:42,460 [salt.fileclient  :1242][WARNING ][3756] Data transport is broken, got: , type: bytes, exception: byte indices must be integers, not str, attempt 1 of 3
2018-06-22 10:34:42,461 [salt.transport.zeromq:128 ][DEBUG   ][3756] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'svr-bss1-dev.domain.ie', 'tcp://10.124.1.30:4506', 'aes')
2018-06-22 10:34:42,461 [salt.crypt       :463 ][DEBUG   ][3756] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'svr-bss1-dev.domain.ie', 'tcp://10.124.1.30:4506')
2018-06-22 10:34:42,463 [salt.transport.zeromq:194 ][DEBUG   ][3756] Connecting the Minion to the Master URI (for the return server): tcp://10.124.1.30:4506
2018-06-22 10:34:42,463 [salt.transport.zeromq:1029][DEBUG   ][3756] Trying to connect to: tcp://10.124.1.30:4506
2018-06-22 10:34:42,651 [salt.fileclient  :1242][WARNING ][3756] Data transport is broken, got: , type: bytes, exception: byte indices must be integers, not str, attempt 2 of 3
2018-06-22 10:34:42,651 [salt.transport.zeromq:128 ][DEBUG   ][3756] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'svr-bss1-dev.domain.ie', 'tcp://10.124.1.30:4506', 'aes')
2018-06-22 10:34:42,652 [salt.crypt       :463 ][DEBUG   ][3756] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'svr-bss1-dev.domain.ie', 'tcp://10.124.1.30:4506')
2018-06-22 10:34:42,653 [salt.transport.zeromq:194 ][DEBUG   ][3756] Connecting the Minion to the Master URI (for the return server): tcp://10.124.1.30:4506
2018-06-22 10:34:42,653 [salt.transport.zeromq:1029][DEBUG   ][3756] Trying to connect to: tcp://10.124.1.30:4506
2018-06-22 10:34:42,840 [salt.fileclient  :1242][WARNING ][3756] Data transport is broken, got: , type: bytes, exception: byte indices must be integers, not str, attempt 3 of 3
2018-06-22 10:34:42,840 [salt.transport.zeromq:128 ][DEBUG   ][3756] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'svr-bss1-dev.domain.ie', 'tcp://10.124.1.30:4506', 'aes')
2018-06-22 10:34:42,840 [salt.crypt       :463 ][DEBUG   ][3756] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'svr-bss1-dev.domain.ie', 'tcp://10.124.1.30:4506')
2018-06-22 10:34:42,842 [salt.transport.zeromq:194 ][DEBUG   ][3756] Connecting the Minion to the Master URI (for the return server): tcp://10.124.1.30:4506
2018-06-22 10:34:42,842 [salt.transport.zeromq:1029][DEBUG   ][3756] Trying to connect to: tcp://10.124.1.30:4506
2018-06-22 10:34:43,029 [salt.fileclient  :1242][WARNING ][3756] Data transport is broken, got: , type: bytes, exception: byte indices must be integers, not str, attempt 4 of 3
2018-06-22 10:34:43,029 [salt.transport.zeromq:128 ][DEBUG   ][3756] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'svr-bss1-dev.domain.ie', 'tcp://10.124.1.30:4506', 'aes')
2018-06-22 10:34:43,030 [salt.crypt       :463 ][DEBUG   ][3756] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'svr-bss1-dev.domain.ie', 'tcp://10.124.1.30:4506')
2018-06-22 10:34:43,031 [salt.transport.zeromq:194 ][DEBUG   ][3756] Connecting the Minion to the Master URI (for the return server): tcp://10.124.1.30:4506
2018-06-22 10:34:43,031 [salt.transport.zeromq:1029][DEBUG   ][3756] Trying to connect to: tcp://10.124.1.30:4506
2018-06-22 10:34:43,032 [salt.fileclient  :1249][ERROR   ][3756] Data transport is broken, got: , type: bytes, exception: byte indices must be integers, not str, retry attempts exhausted
2018-06-22 10:34:43,067 [salt.fileclient  :1257][INFO    ][3756] Fetching file from saltenv 'base', ** done ** 'bss/common/dev_solr_data/DATA_for_SOLR_headless.csv'
2018-06-22 10:34:43,351 [salt.utils.lazy  :100 ][DEBUG   ][3756] LazyLoaded cmd.run
2018-06-22 10:34:43,351 [salt.loaded.int.module.cmdmod:392 ][INFO    ][3756] Executing command ['lsattr', '/var/solr/data/input/DATA_for_SOLR_headless.csv'] in directory '/root'
2018-06-22 10:34:43,356 [salt.loaded.int.module.cmdmod:755 ][DEBUG   ][3756] stdout: ---------------- /var/solr/data/input/DATA_for_SOLR_headless.csv
2018-06-22 10:34:43,356 [salt.loaded.int.module.cmdmod:1080][DEBUG   ][3756] output: ---------------- /var/solr/data/input/DATA_for_SOLR_headless.csv
2018-06-22 10:34:43,358 [salt.state       :308 ][INFO    ][3756] File /var/solr/data/input/DATA_for_SOLR_headless.csv is in the correct state
2018-06-22 10:34:43,358 [salt.state       :1968][INFO    ][3756] Completed state [/var/solr/data/input/DATA_for_SOLR_headless.csv] at time 10:34:43.358758 (duration_in_ms=15634.964)

Versions Report

(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.) Master

Salt Version:
           Salt: 2018.3.1

Dependency Versions:
           cffi: 1.11.2
       cherrypy: unknown
       dateutil: 2.7.2
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.8
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.1
   mysql-python: Not Installed
      pycparser: 2.18
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 3.4.8 (default, Mar 23 2018, 10:04:27)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.4.2
            ZMQ: 4.1.4

System Versions:
           dist: centos 7.5.1804 Core
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-693.2.2.el7.x86_64
         system: Linux
        version: CentOS Linux 7.5.1804 Core

Minion:

Salt Version:
           Salt: 2018.3.1

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.8
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.1
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 3.4.8 (default, Mar 23 2018, 10:04:27)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.4.2
            ZMQ: 4.1.4

System Versions:
           dist: centos 7.5.1804 Core
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-514.6.1.el7.x86_64
         system: Linux
        version: CentOS Linux 7.5.1804 Core
Ch3LL commented 6 years ago

im having a hard time replicating this. any chance you can share a sanitized file where you are able to replicate this issue?

chewborg commented 6 years ago

Hi. Didn't mean to take so long replying. The data is licensed so I really did needed to sanitise it. It contains Unicode characters (as Gaeilge or Irish). I tried to replace all Alpha characters with 'X' but then the file would transfer ok, so it's seemed likely it was somehow related to the unicode characters (fada over vowels). I've now an anonymised data file with just ascii alpha characters rendered as 'X' and it failed to transfer.

How can I get this to you? I have the sanitised file on salt fileserver compressed to 48MB, and the truncated transfer on the minion to 2.5MB.

Ch3LL commented 6 years ago

can you email it to me or share it with me through my email through google drive or some other storage mechanism if it is bigger then the email size limit? mwilhite@saltstack.com

chewborg commented 6 years ago

Thanks Megan, I had a Dropbox I forgot about, shared now with you email address. If that is no good I can set up Google Drive?

Ch3LL commented 6 years ago

thanks for sharing this with me. Unfortunately i'm still having a hard time replicating this.

ping @saltstack/team-core any ideas here?

DmitryKuzmenko commented 6 years ago

I've seen something close to this during fixing py3 integration tests. I don't remember the exact procedure but it was happening in test logger that works as a subprocess through tornado. I bet it's the result of the way we mess forking and tornado event loops that is the issue I'm working on at this moment.

Ch3LL commented 6 years ago

Is there another issue we can link to this one @DmitryKuzmenko ?

rallytime commented 6 years ago

https://github.com/saltstack/salt-jenkins/issues/995

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

brandon-sling commented 4 years ago

I had this same thing happen with a file. I figured out what was wrong with the file. # This should be at least (number_of_channels × 2) to be able to keep the live there is the character '×' Upon removal of this character (replaced with an 'x') transport errors went away.

minion:
    Salt Version:
               Salt: 2019.2.3

    Dependency Versions:
               cffi: Not Installed
           cherrypy: Not Installed
           dateutil: 2.4.2
          docker-py: Not Installed
              gitdb: Not Installed
          gitpython: Not Installed
              ioflo: Not Installed
             Jinja2: 2.8
            libgit2: Not Installed
            libnacl: Not Installed
           M2Crypto: Not Installed
               Mako: Not Installed
       msgpack-pure: Not Installed
     msgpack-python: 0.4.6
       mysql-python: Not Installed
          pycparser: Not Installed
           pycrypto: 2.6.1
       pycryptodome: Not Installed
             pygit2: Not Installed
             Python: 3.5.2 (default, Jul  5 2016, 12:43:10)
       python-gnupg: 0.3.8
             PyYAML: 3.11
              PyZMQ: 15.2.0
               RAET: Not Installed
              smmap: Not Installed
            timelib: Not Installed
            Tornado: 4.2.1
                ZMQ: 4.1.4

    System Versions:
               dist: Ubuntu 16.04 xenial
             locale: ANSI_X3.4-1968
            machine: x86_64
            release: 4.4.0-34-generic
             system: Linux
            version: Ubuntu 16.04 xenial
master:
    Salt Version:
               Salt: 2019.2.3

    Dependency Versions:
               cffi: 1.13.2
           cherrypy: 8.7.0
           dateutil: 2.4.2
          docker-py: Not Installed
              gitdb: 0.6.4
          gitpython: 1.0.1
              ioflo: Not Installed
             Jinja2: 2.8
            libgit2: Not Installed
            libnacl: Not Installed
           M2Crypto: Not Installed
               Mako: Not Installed
       msgpack-pure: Not Installed
     msgpack-python: 0.4.6
       mysql-python: 1.3.7
          pycparser: 2.19
           pycrypto: 2.6.1
       pycryptodome: Not Installed
             pygit2: Not Installed
             Python: 3.5.2 (default, Oct  8 2019, 13:06:37)
       python-gnupg: 0.3.8
             PyYAML: 3.11
              PyZMQ: 15.2.0
               RAET: Not Installed
              smmap: 0.9.0
            timelib: Not Installed
            Tornado: 4.2.1
                ZMQ: 4.1.4

    System Versions:
               dist: Ubuntu 16.04 xenial
             locale: ANSI_X3.4-1968
            machine: x86_64
            release: 4.4.0-34-generic
             system: Linux
            version: Ubuntu 16.04 xenial