tjake / Solandra

Solandra = Solr + Cassandra
Apache License 2.0
882 stars 150 forks source link

db corruption: AssertionError DecoratedKey != DecoratedKey #158

Closed codingismy11to7 closed 12 years ago

codingismy11to7 commented 12 years ago

We're reliably (but not reproducibly) hitting a database corruption issue that we can't recover from, or at least haven't figured out a way to do so. Everything I can find on the internet is talking about a cassandra bug that was fixed in 0.6.1, and was for indexes over 2GB (we're nowhere close to that), so I'm guessing it's caused by Solandra somehow.

Once we get into this state, queries against the selected core never return, and Solandra spits out the same stack trace over and over every few seconds:

java.lang.AssertionError: DecoratedKey(90002160063266891977802944676337065984, 63757272656e74) != DecoratedKey(90002160063266891977802944676337065984, 3930303032313630303633323636383931393737383032393434363736333337303635393834efbfbf736861726473) in /path/to/solandra/data/data/L/SI-g-65-Data.db
        at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:59)
        at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:66)
        at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
        at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1407)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1304)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1261)
        at org.apache.cassandra.db.Table.getRow(Table.java:385)
        at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:61)
        at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:668)
        at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1133)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

Hopefully this tells you something? We could probably send along affected db files.

tjake commented 12 years ago

what revision of solandra is this? last commit?

codingismy11to7 commented 12 years ago

I thought we were up to date, but it looks like we last pulled on Sept 29th, so there have been some new commits. Could this possibly be fixed? Is there a way to upgrade solandra without losing the data?

tjake commented 12 years ago

Don't update to the latest since it includes a breaking change.

Did this start happening once you updated?

My guess is it's related to this change https://github.com/tjake/Solandra/commit/386746a55756ee5a5aebcba69a441350133590b3

perhaps you can revert that and see if that helps

tjake commented 12 years ago

The actual fix for this is https://github.com/tjake/Solandra/commit/7b18f06f286820d4f3181c1ddafda6dfb2cc9672 but it requires re-indexing. if you want to just fix the problem without updating then i think reverting https://github.com/tjake/Solandra/commit/386746a55756ee5a5aebcba69a441350133590b3 will do that