tjake / Solandra

Solandra = Solr + Cassandra
Apache License 2.0
882 stars 150 forks source link

Solandra removes stored documents after adding a new one #81

Closed cwesdorp closed 13 years ago

cwesdorp commented 13 years ago

Hello,

I used Solr for a projects but wanted to give Solandra a try because of the virtual core feature. However, I run into a problem. I can add documents to Solandra which seems to work fine, but when there is a interval of 30 seconds between updates cassandra logs about expired ShardInfo and all documents inserted previously are lost.

This is part of the log file:

09:35:09,238  INFO CassandraIndexManager:232 - thoughtbucket has 4 shards
09:35:09,249  INFO CassandraIndexManager:449 - 65533 reserved ids for thoughtbucket have expired
09:35:20,052  INFO CassandraIndexManager:639 - Reserved 16384 ids for 15111178303116600061766658194992816947 shard 0 from slot 009:35:20,058  INFO CassandraIndexManager:639 - Reserved 16384 ids for 15111178303116600061766658194992816947 shard 1 from slot 0
09:35:20,089  INFO CassandraIndexManager:639 - Reserved 16384 ids for 15111178303116600061766658194992816947 shard 2 from slot 009:35:20,097  INFO CassandraIndexManager:639 - Reserved 16384 ids for 15111178303116600061766658194992816947 shard 3 from slot 0
09:35:20,124  INFO UpdateRequestProcessor:171 - {add=[27715547-b660-44d5-bd88-fa1b2847feba],commit=} 0 1089309:35:20,125  INFO SolrCore:1370 - [thoughtbucket] webapp=/solandra path=/update/json params={} status=0 QTime=10893

Solandra is running on standard configuration as pulled from the repository, the branch used is the solandra. The one thing I changed is adding the json handler (because my app used json for Solr already). But also when using XML updates this problem occurs. The reuters demo doesn't seem to be affected but I can't spot the difference.

I run Solandra on OS X 10.6.6 java 1.6.0_22. Please let me know if you need some more info.

Regards, Chris

tjake commented 13 years ago

That is very odd... So you are saying when you add documents and search they show up but after 30 seconds they disappear?

Are you able to reproduce with the demo app that comes with Solandra?

cwesdorp commented 13 years ago

The items disappear after a new document is added and the logging shows "ShardInfo for thoughtbucket has expired" and "32767 reserved ids for thoughtbucket have expired"

The reuters demo does not show this problem.

There was one difference I noticed. When I add a document the commit is in the same message posted to Solandra. I have changed my program to send the submit separately and things seem to go better now. Could there be a problem in submitting a 'multi' document?

cwesdorp commented 13 years ago

I was able to reproduce the issue for a few times now based on the solandra branch.

I opened a repository, https://github.com/cwesdorp/solandra-issue81-docs, with some documents I used for testing. The upload_docs.sh expects a JSON handler to be added to the solr properties.

First execute upload_schema.sh. Then execute "uploaddoc.sh doc1 commit" which commits the first document to the store. Execute query.sh or open "http://localhost:8983/solandra/thoughtbucket/select?wt=json&indent=on&q=:_" in a browser, one result is shown. Wait 30 seconds. Then execute "uploaddoc.sh doc2 commit". Execute query.sh or open "http://localhost:8983/solandra/thoughtbucket/select?wt=json&indent=on&q=:_" in a browser, in my case the first document has disappeared.

tjake commented 13 years ago

thanks! I'll dig in

cwesdorp commented 13 years ago

Maybe this is something: I just pulled the latest commits and had another try because I wanted to see what is stored in the cassandra index. After committing a document a warning/error is shown when listing the TI column family.

[default@L] list TI; 
Using default limit of 100
-------------------
RowKey: 3660269691873133950988201044714607423?allText?http
Unknown comparator 'lucandra.VIntType'. Available functions: bytes, integer, long, lexicaluuid, timeuuid, utf8, ascii.
tjake commented 13 years ago

Are you connecting to solandra via cassandra-cli? That is strange, but I don't think it's related.

I changed the index id generation code which may help with the problem. Could you give the latest (about 5 hours ago) a try?

-Jake

cwesdorp commented 13 years ago

Hi Jake,

yes I used the cassandra-cli in the solandra-app/cassandra-tools folder.

Unfortunately the behavior hasn't changed for the better. Now I experience disappearing of documents some time after the last commit, the timeout doesn't seem to be a fixed time. Also, I had a situation where a document was posted and not returned in the query result. When I shutdown solandra I remove the /tmp/cassandra-data and the /tmp/index folder it seems to create as well, before starting it again. For my test I did a complete new clone of the repository.

Chris

tjake commented 13 years ago

Chris,

We've pinpointed the problem and are working on a fix, should be done soon.

-Jake

cwesdorp commented 13 years ago

Hi Jake,

a lot of commits related to indexes and ids have come by so I thought to give it a try. I pulled the latest and did a quick test using the test docs provided earlier. At this point I can't reproduce the issue. Are the recent commits related or do you also consider the issue fixed? If not I will test again later again.

Chris

tjake commented 13 years ago

Yes, this is now verified fixed.