Open GoogleCodeExporter opened 8 years ago
Hi,
Thanks for using lucene-gosen.
If you do not mind, would you teach me the setting of each
FieldType(lucene-gosen and alpha-numeric)?
How many do you set a query pattern in queries.txt?
How many do you set a number of setting queryResultCache?
-Jun
Original comment by johtani
on 27 Feb 2012 at 9:03
Hi, thanks for your response.
Through the stress test, I used following FieldType setting.
<fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.JapaneseTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
Just to clarify, I tried 2 types of queries.txt (japanese or alpha-numeric) for
a field of this FieldType only.
After I changed type of queries from japanese to alpha-numeric, Solr didn't
stuck and started working smoothly.
> How many do you set a query pattern in queries.txt?
I set 5,000 patterns of japanese queries and alpha-numeric queries respectively.
(i.e. 5000 lines of janenese-queries.txt and another 5000 lines of
alpha-numeric-queries.txt )
I also tried 1000 patterns. But I got same results.
> How many do you set a number of setting queryResultCache?
I tried default (512), larger (5000) and no cache (comment-out relevant part in
solrconfig.xml). But I got same results.
- Yanbe
Original comment by ya...@hatena.ne.jp
on 27 Feb 2012 at 10:37
Sorry for late reply.
Are the values ("Result count" in "Query Statistics" tab) in both of (japanese
/ alpha-numeric) stress test close to each other?
> Just to clarify, I tried 2 types of queries.txt (japanese or alpha-numeric)
for a field of this FieldType only.
> After I changed type of queries from japanese to alpha-numeric, Solr didn't
stuck and started working smoothly.
Is the phrase contained in queries.txt ?
Alpha-numeric term is not tokennized by lucene-gosen tokenizer.
This means that lucene-gosen tokenizer does not create lattice for
alpha-numeric.
Lucene-gosen tokenizer tokenize process
1. create lattice.
2. find best path.
3. return tokens
Many path are found in Japanese tokenize processing.
For this reason, many objects are generated.
Reference site.(lucene-gosen performance test)
http://wiki.livedoor.jp/haruyama_seigo/d/Solr/Tokenizer%c9%be%b2%c1201105
http://www.rondhuit.com/solr%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E5%AF%BE%E5%BF%
9C.html
Original comment by johtani
on 2 Mar 2012 at 8:38
Original issue reported on code.google.com by
ya...@hatena.ne.jp
on 24 Feb 2012 at 12:27Attachments: