Closed noelmcnulty closed 9 years ago
Thanks for the report @noelmcnulty !
Could you email me your full ami.log from one of these failed nodes as well as the tree
output of ~/datastax_ami?
On launch we do a hard reset:
https://github.com/riptano/ComboAMI/blob/2.5/ds0_updater.py#L16-17
to ensure the AMI has the appropriate repo keys:
https://github.com/riptano/ComboAMI/tree/2.5/repo_keys
So let's first try to see if the AMIs are misconfigured or competing with AWS cleaning scripts and then we'll check if something's different with the server shortly after midnight (GMT).
Thanks again!
Hi Joaquin,
Noel has finished for the day but I can send you on the logs, I'll email them to you now. Unfortunately the instance has been torn down so I cannot get you the tree output but we'll monitor over the weekend and try to get the info for you should it occur again.
Thanks,
Lyndsey
Okay that works. Just send over the tree output the next time you spot this issue please.
I'll look over the logs today.
Thanks again!
The logs you sent look clean and seem to have imported the key correctly. I've created a ticket in our private repo to investigate this issue. Do let us know the frequencies and times of this occurrence, if possible.
Thanks again!
Hi Joaquin,
More random failures over the weekend I'm afraid but this time it seems related to the devices.
[INFO] address.yaml configured.
[EXEC] 02/16/15-01:13:31 sudo chmod 777 /etc/fstab
[EXEC] 02/16/15-01:13:31 sudo chmod 644 /etc/fstab
[INFO] Unformatted devices: []
[INFO] Clear "invalid flag 0x0000 of partition table 4" by issuing a write, then running fdisk on the device...
[ERROR] Exception seen in ds1_launcher.py:
Traceback (most recent call last):
File "/home/ubuntu/datastax_ami/ds1_launcher.py", line 22, in initial_configurations
ds2_configure.run()
File "/home/ubuntu/datastax_ami/ds2_configure.py", line 1153, in run
File "/home/ubuntu/datastax_ami/ds2_configure.py", line 1010, in prepare_for_raid
File "/home/ubuntu/datastax_ami/ds2_configure.py", line 956, in format_xfs
IndexError: list index out of range
This has occurred several times over the weekend but is intermittent. Any help would be much appreciated!
This issue does seem unrelated. You may want to ensure that the devices were added during the AMI's launch. If you see this issue again, try searching the system for these extra devices. If they end up appearing when you look for them, we may be hitting a race condition with EC2 adding the devices in a delayed fashion.
Closing this since we're unable to reproduce with the current info. Re-open if more data becomes available.
@mlococo @joaquincasares This very same error has occured for me 4 times in a row on the DataStax Auto-Clustering AMI 2.6.1-1404-hvm ami-0c26747b image. I'm running 2 m3.large instances which is giving me the exact above mentioned log. The same log appears on both nodes and results in nothing being set up or installed. This is also from the eu-west-1 (Ireland) dc.
Currently failing images on eu-west-1:
ami-7f33cd08 worked fine yesterday.
Error:
The following NEW packages will be installed:
cassandra datastax-agent dsc22 python-cql python-thrift-basic
0 upgraded, 5 newly installed, 0 to remove and 2 not upgraded.
Need to get 47.3 MB of archives.
After this operation, 58.9 MB of additional disk space will be used.
WARNING: The following packages cannot be authenticated!
cassandra datastax-agent dsc22 python-thrift-basic python-cql
[ERROR] 08/14/15-11:38:27 sudo service cassandra stop:
cassandra: unrecognized service
[EXEC] 08/14/15-11:38:27 sudo rm -rf /var/lib/cassandra
[EXEC] 08/14/15-11:38:27 sudo rm -rf /var/log/cassandra
[EXEC] 08/14/15-11:38:27 sudo mkdir -p /var/lib/cassandra
[EXEC] 08/14/15-11:38:27 sudo mkdir -p /var/log/cassandra
[ERROR] 08/14/15-11:38:27 sudo chown -R cassandra:cassandra /var/lib/cassandra:
chown: invalid user: `cassandra:cassandra'
[ERROR] 08/14/15-11:38:28 sudo chown -R cassandra:cassandra /var/log/cassandra:
chown: invalid user: `cassandra:cassandra'
[EXEC] 08/14/15-11:38:28 sudo mv /etc/security/limits.d/cassandra.conf.bak /etc/security/limits.d/cassandra.conf
[INFO] Installing OpsCenter...
[EXEC] 08/14/15-11:38:28 sudo apt-get install -y opscenter libssl0.9.8:
Reading package lists...
Building dependency tree...
Reading state information...
The following package was automatically installed and is no longer required:
grub-pc-bin
Use 'apt-get autoremove' to remove them.
The following NEW packages will be installed:
libssl0.9.8 opscenter
0 upgraded, 2 newly installed, 0 to remove and 2 not upgraded.
Need to get 77.5 MB of archives.
After this operation, 103 MB of additional disk space will be used.
WARNING: The following packages cannot be authenticated!
opscenter
[ERROR] 08/14/15-11:38:28 sudo service opscenterd stop:
opscenterd: unrecognized service
[INFO] Reflector loop...
[INFO] 08/14/15-11:38:28 Reflector: Received 1 of 1 responses from: [u'172.31.21.194']
[INFO] Seed list: set([u'172.31.21.194'])
[INFO] OpsCenter: 172.31.21.194
[INFO] Options: Namespace(analyticsnodes=0, base64postscript=None, bootstrap=False, cfsreplication=None, clustername='shiplog-cassandra', customreservation=None, email=None, hadoop=False, heapsize=None, multiregion=False, opscenter=None, opscenterinterface=None, opscenterip=None, opscenteronly=False, opscenterssl=False, password='wo2FoHE8f0gUYqQh', raidonly=False, realtimenodes=2, reflector=None, release=None, rpcbinding=False, searchnodes=0, seed_indexes=[0, 2, 2], seeds=None, totalnodes=2, username='kaare@shiplog.no', version='community', vnodes=False)
[ERROR] Exception seen in ds1_launcher.py:
Traceback (most recent call last):
File "/home/ubuntu/datastax_ami/ds1_launcher.py", line 22, in initial_configurations
ds2_configure.run()
File "/home/ubuntu/datastax_ami/ds2_configure.py", line 1178, in run
File "/home/ubuntu/datastax_ami/ds2_configure.py", line 577, in construct_yaml
IOError: [Errno 2] No such file or directory: '/etc/cassandra/cassandra.yaml'
fwiw, I get the stuck machine and try again the command: sudo apt-get install -y python-cql datastax-agent cassandra=2.0.16 dsc20=2.0.16-1
And had the following output.
ubuntu@ip-10-0-102-235:~$ sudo apt-get install -y python-cql datastax-agent cassandra=2.0.16 dsc20=2.0.16-1
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
grub-pc-bin
Use 'apt-get autoremove' to remove them.
The following extra packages will be installed:
python-thrift-basic
The following NEW packages will be installed:
cassandra datastax-agent dsc20 python-cql python-thrift-basic
0 upgraded, 5 newly installed, 0 to remove and 142 not upgraded.
Need to get 37.5 MB of archives.
After this operation, 43.3 MB of additional disk space will be used.
WARNING: The following packages cannot be authenticated!
cassandra datastax-agent dsc20 python-thrift-basic python-cql
E: There are problems and -y was used without --force-yes
So I tried adding the --force-yes
ubuntu@ip-10-0-102-235:~$ sudo apt-get install --force-yes -y python-cql datastax-agent cassandra=2.0.16 dsc20=2.0.16-1
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
grub-pc-bin
Use 'apt-get autoremove' to remove them.
The following extra packages will be installed:
python-thrift-basic
The following NEW packages will be installed:
cassandra datastax-agent dsc20 python-cql python-thrift-basic
0 upgraded, 5 newly installed, 0 to remove and 142 not upgraded.
Need to get 37.5 MB of archives.
After this operation, 43.3 MB of additional disk space will be used.
WARNING: The following packages cannot be authenticated!
cassandra datastax-agent dsc20 python-thrift-basic python-cql
Get:1 http://debian.datastax.com/community/ stable/main cassandra all 2.0.16 [14.5 MB]
Get:2 http://debian.datastax.com/community/ stable/main datastax-agent all 5.2.0 [22.8 MB]
Get:3 http://debian.datastax.com/community/ stable/main dsc20 all 2.0.16-1 [1,308 B]
Get:4 http://debian.datastax.com/community/ stable/main python-thrift-basic all 0.8.0-1~ds+1 [70.6 kB]
Get:5 http://debian.datastax.com/community/ stable/main python-cql all 1.4.0-1 [59.2 kB]
Fetched 37.5 MB in 16s (2,320 kB/s)
Selecting previously unselected package cassandra.
(Reading database ... 87806 files and directories currently installed.)
Unpacking cassandra (from .../cassandra_2.0.16_all.deb) ...
Selecting previously unselected package datastax-agent.
Unpacking datastax-agent (from .../datastax-agent_5.2.0_all.deb) ...
Selecting previously unselected package dsc20.
Unpacking dsc20 (from .../dsc20_2.0.16-1_all.deb) ...
Selecting previously unselected package python-thrift-basic.
Unpacking python-thrift-basic (from .../python-thrift-basic_0.8.0-1~ds+1_all.deb) ...
Selecting previously unselected package python-cql.
Unpacking python-cql (from .../python-cql_1.4.0-1_all.deb) ...
Processing triggers for ureadahead ...
Setting up cassandra (2.0.16) ...
Configuration file `/etc/security/limits.d/cassandra.conf'
==> File on system created by you or by a script.
==> File also in package provided by package maintainer.
What would you like to do about it ? Your options are:
Y or I : install the package maintainer's version
N or O : keep your currently-installed version
D : show the differences between the versions
Z : start a shell to examine the situation
The default action is to keep your current version.
*** cassandra.conf (Y/I/N/O/D/Z) [default=N] ?
It works, excepted I have a configuration conflict but I think this would work as a workaround.
To have the exacte description of the issues, I ran the command without the -y or --force-yes options and get this:
WARNING: The following packages cannot be authenticated! cassandra datastax-agent dsc20 python-thrift-basic python-cql Install these packages without verification [y/N]?
There is clearly an issue around authentication. This would be the proper thing to fix imho. Yet adding --force-yes would have avoid this, maybe it is something you want to consider ? What could be a work around to still be able to use the AMI when this kind of things occur ?
Hope this help.
This is a similar symptom, but likely a different underlying issue since this was intermittent and the current issue is consistent. Leaving closed and let's keep discussion of the new issue in #88.
We've seen occasional CI EC2 deployments fail in the EU-West (Ireland) AWS region.
The contents of _~/datastaxami/ami.log suggests apt authentication failures when installing the DataStax/Cassandra packages:
~/datastax_ami/ami.log:
We're using the ami-8932ccfe AMI and supplying the following user data parameters:
This is not easily repeatable and we only seem to see it during deployments which kick off shortly after midnight (GMT), but this timing way well be a coincidence.