purepennons / gss

Automatically exported from code.google.com/p/gss
Other
0 stars 0 forks source link

solr gives errors with rich text documents #20

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Upload rich text documents
2.
3.

What is the expected output? What do you see instead?

File being indexed by solr and visible when searching is expected. I see 
nothing of above due to solr error.

What version of the product are you using? On what operating system?

Jboss AS 5.1.0 GA, centos 5 x64, HornetQ 2.0.0, solr 1.3.0 patched for rich 
documents.

Please provide any additional information below.

Attached is output from solr complaining about 'body' field. After patching 
solr, what else has to be done? I copied example/solr directory to deploy/ 
directory on jboss server, and copied solr.war to deploy directory. Everything 
related to solr works except rich text documents as it seems.

Regards,
Nikola

Original issue reported on code.google.com by ngara...@gmail.com on 9 Jul 2010 at 2:00

Attachments:

GoogleCodeExporter commented 8 years ago
You are likely getting this error due to a misconfigured Solr schema. Check out 
the differences between yours and the attached one. You are probably missing 
the <field name="body" type="text_greek" indexed="true" stored="false"/> line, 
as weel as the <defaultSearchField>body</defaultSearchField> one. Feel free of 
course to use a different field type or a different tokenizer for your 
language. 

Original comment by past...@gmail.com on 12 Jul 2010 at 2:56

Attachments:

GoogleCodeExporter commented 8 years ago
Using previously attached schema.xml file gives me error "undefined field 
text". Sorry, but I dont get Solr. Am I supposed to write my own schema.xml 
file to get richtext document to work? Is there some example schema.xml file 
that should work out of the box? I am using all solr files from example 
directory of solr 1.3.0. 

Regards,
Nikola

Original comment by ngara...@gmail.com on 13 Jul 2010 at 11:55

Attachments:

GoogleCodeExporter commented 8 years ago
This is weird. The schema.xml file I attached previously is the one that I'm 
using here. The FieldType 'text' is defined in line 161 of schema.xml. Can you 
verify that your schema.xml file has the following line there?

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">

Can you attach the solr log from 
apache-solr-1.3.0/example/logs/solr_console.log? There might be something in 
there that might shed some light on the actual issue here. 

Original comment by past...@gmail.com on 13 Jul 2010 at 2:52

GoogleCodeExporter commented 8 years ago
The line is indeed present in schema.xml file you provided me before.
I dont use solr standalone but I deploy it to jboss server, so it runs under 
jboss and I can grep jboss log for solr since there are no files in solr logs 
directory mentioned. 

Original comment by ngara...@gmail.com on 14 Jul 2010 at 12:54

Attachments:

GoogleCodeExporter commented 8 years ago
The error you are getting is when making search requests to solr (in the 
/select path), not when indexing documents (in the /update or /update/rich 
path). Do you get any errors when uploading documents or just when searching 
for them?

That said, I'm at a loss to explain why solr fails to see the definition of the 
text field type when searching, but has no problem with it when indexing. Can 
you change the Solr log level to DEBUG, so that we can get more information in 
the log?

Original comment by past...@gmail.com on 14 Jul 2010 at 1:42

GoogleCodeExporter commented 8 years ago
Also, could you stop the server, wipe out the existing index (probably in 
solr/data/index somewhere) and try restarting the server to create a new one?

Original comment by past...@gmail.com on 14 Jul 2010 at 1:44

GoogleCodeExporter commented 8 years ago
So, trial and error method seems to have worked out. I reverted back to 
schema.xml provided with solr (solr/example/conf/schema.xml), deleted data 
directory inside solr directory, I also had to change multipartUploadLimitInKB 
in solrconfig.xml because it gave errors with default value. All that did not 
work at first, but somehow now it works. I get search results for pdf file, for 
rar file, so at first it seems to work with normal and rich files too. I use 
solr.solr.home directive when starting jboss since I cannot google where to set 
this property. Setting it inside solr.war file did not work last time I gave it 
a try.

Find attached log, maybe there is clue to my success inside it. Log looks ugly 
btw :)

Regards,
Nikola

Original comment by ngara...@gmail.com on 16 Jul 2010 at 10:54

Attachments:

GoogleCodeExporter commented 8 years ago
Excellent! I'm glad you were able to get past this issue. Actually the previous 
error is evident now that erasing the index made things work for you: when 
adding documents to the index, solr doesn't force you to maintain a single 
schema for the lifetime of the index (which would be the case if you were 
storing the documents in a relational DBMS). So you probably uploaded some 
files using the default schema.xml, afterwards changed the schema.xml using my 
attachement above, and added more documents. Then when solr tries to search for 
a query that contains fields that were not present in the old document schema, 
logs the error about "undefined field text". In this case it is not undefined 
in the (new) schema, but not defined in the old persisted document in the index.

Regarding multipartUploadLimitInKB, make sure you match the value in 
solrconfig.xml with the property solrDocumentUploadLimitInKB in gss.properties. 
If the latter is larger than the former, gss will attempt to send large 
documents to solr only to get rejected due to the smaller size limit in solr.

The log level you enabled in the last log is FINE, which logs more things than 
anyone would need to know :-)

You should be able to set the solr home property in jboss/bin/run.conf, adding 
-Dsolr.home=foo in the JAVA_OPTS line, if you are not doing this already.

I'm closing this issue since indexing and searching now works for you, but if 
you encounter any other problems, feel free to open a new one.

Original comment by past...@gmail.com on 16 Jul 2010 at 12:14