riptano / ccm

A script to easily create and destroy an Apache Cassandra cluster on localhost
Apache License 2.0
1.22k stars 303 forks source link

Running multiple ccm node commands at once causes commands to fail. #578

Closed hkroger closed 7 years ago

hkroger commented 7 years ago

If you run like ccm node1 cqlsh -e "desc tables;" & ccm node1 cqlsh -e "desc tables;" &ccm node1 cqlsh -e "desc tables;" &ccm node1 cqlsh -e "desc tables;" &ccm node1 cqlsh -e "desc tables;" &ccm node1 cqlsh -e "desc tables;" &ccm node1 cqlsh -e "desc tables;" &

You end up getting:

  File "/usr/local/Cellar/ccm/2.1.6/libexec/bin/ccm", line 74, in <module>
    cmd.run()
  File "/usr/local/Cellar/ccm/2.1.6/libexec/lib/python2.7/site-packages/ccmlib/cmds/node_cmds.py", line 394, in run
    self.node.run_cqlsh(self.options.cmds, self.options.verbose, self.cqlsh_options)
  File "/usr/local/Cellar/ccm/2.1.6/libexec/lib/python2.7/site-packages/ccmlib/node.py", line 786, in run_cqlsh
    env = self.get_env()
  File "/usr/local/Cellar/ccm/2.1.6/libexec/lib/python2.7/site-packages/ccmlib/node.py", line 176, in get_env
    return common.make_cassandra_env(self.get_install_dir(), self.get_path(), update_conf)
  File "/usr/local/Cellar/ccm/2.1.6/libexec/lib/python2.7/site-packages/ccmlib/common.py", line 195, in make_cassandra_env
    replaces_in_file(dst, replacements)
  File "/usr/local/Cellar/ccm/2.1.6/libexec/lib/python2.7/site-packages/ccmlib/common.py", line 139, in replaces_in_file
Traceback (most recent call last):
  File "/usr/local/Cellar/ccm/2.1.6/libexec/bin/ccm", line 74, in <module>
    cmd.run()
  File "/usr/local/Cellar/ccm/2.1.6/libexec/lib/python2.7/site-packages/ccmlib/cmds/node_cmds.py", line 394, in run
    shutil.move(file_tmp, file)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 302, in move
    self.node.run_cqlsh(self.options.cmds, self.options.verbose, self.cqlsh_options)
  File "/usr/local/Cellar/ccm/2.1.6/libexec/lib/python2.7/site-packages/ccmlib/node.py", line 786, in run_cqlsh
    copy2(src, real_dst)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 130, in copy2
    env = self.get_env()
  File "/usr/local/Cellar/ccm/2.1.6/libexec/lib/python2.7/site-packages/ccmlib/node.py", line 176, in get_env
    copyfile(src, dst)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 82, in copyfile
    return common.make_cassandra_env(self.get_install_dir(), self.get_path(), update_conf)
  File "/usr/local/Cellar/ccm/2.1.6/libexec/lib/python2.7/site-packages/ccmlib/common.py", line 195, in make_cassandra_env
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: '/Users/hkroger/.ccm/test/node1/bin/cassandra.in.sh.tmp'
    replaces_in_file(dst, replacements)
  File "/usr/local/Cellar/ccm/2.1.6/libexec/lib/python2.7/site-packages/ccmlib/common.py", line 139, in replaces_in_file
    shutil.move(file_tmp, file)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 302, in move
    copy2(src, real_dst)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 130, in copy2
    copyfile(src, dst)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: '/Users/hkroger/.ccm/test/node1/bin/cassandra.in.sh.tmp'

[8]   Stopped                 ccm node1 cqlsh -e "desc tables;"
[9]   Exit 1                  ccm node1 cqlsh -e "desc tables;"

[10]+  Stopped                 ccm node1 cqlsh -e "desc tables;"

[11]   Stopped                 ccm node1 cqlsh -e "desc tables;"

[12]-  Stopped                 ccm node1 cqlsh -e "desc tables;"
[13]   Exit 1                  ccm node1 cqlsh -e "desc tables;"

[14]   Stopped                 ccm node1 cqlsh -e "desc tables;"
hkroger commented 7 years ago

Apparently a race condition with a temporary file

ptnapoleon commented 7 years ago

I don't really have good news for you here. This definitely happens, but is also expected to happen. Concurrent operations against a cluster were not something considered in the original design, and short of re-writing the majority of the internals, are not possible.

While it would be nice if operations that make sense to run concurrently, like cqlsh, could be, that's still too much engineering effort.

hkroger commented 7 years ago

I don't know if it affects other parts as well but I checked the code and in common.py methods replaces_in_file and replaces_or_add_into_file_tail which use the hardwired tmp file name. If those used e.g. format of file + "." + str(os.getpid()) + ".tmp" instead of file + ".tmp" that might help with this particular case.

But as I said, I don't know if other parts of the system are affected as well.

hkroger commented 7 years ago

At least this particular problem seems to get fixed with this?

ptnapoleon commented 7 years ago

I'd accept that PR then

hkroger commented 7 years ago

What do you think: https://github.com/pcmanus/ccm/pull/581

hkroger commented 7 years ago

I admit it might be a bit naive approach for this problem but multiple parallel instances would end up creating same conf anyways, right?

hkroger commented 7 years ago

I will close this now since the immediate problem is solved with the above PR.