terrier-org / terrier-core

Terrier IR Platform
http://terrier.org/
Other
250 stars 63 forks source link

Support for Devanagari (Non-English) text indexing and retrieval #227

Open 222112Akki opened 11 months ago

222112Akki commented 11 months ago

akhilesh@akki:/Desktop/terrier-core-4.1/bin$ sh trec_setup.sh /home/akhilesh/Desktop/terrier-core-4.1/share/hindi Setting TERRIER_HOME to /home/akhilesh/Desktop/terrier-core-4.1 /home/akhilesh/Desktop/terrier-core-4.1/target/terrier-core-4.1-jar-with-dependencies.jar:/home/akhilesh/Desktop/terrier-core-4.1/etc/logback.xml Creating collection.spec file. Creating logging configuration (logback.xml) file in /home/akhilesh/Desktop/terrier-core-4.1/etc/ Creating terrier.properties file.

add the files to index

/home/akhilesh/Desktop/terrier-core-4.1/share/hindi/1.trec /home/akhilesh/Desktop/terrier-core-4.1/share/hindi/akhilesh.trec Updated collection.spec file. Please check that it contains all and only all the files to be indexed, or create it manually. akhilesh@akki:/Desktop/terrier-core-4.1/bin$ sh trec_terrier.sh -i Setting TERRIER_HOME to /home/akhilesh/Desktop/terrier-core-4.1 16:23:53.235 [main] INFO o.t.i.MultiDocumentFileCollection - TRECCollection read collection specification 16:23:53.237 [main] INFO o.t.i.MultiDocumentFileCollection - TRECCollection processing /home/akhilesh/Desktop/terrier-core-4.1/share/hindi/1.trec 16:23:53.253 [main] INFO o.t.structures.indexing.Indexer - creating the data structures data_1 16:23:53.267 [main] WARN o.t.structures.indexing.Indexer - Adding empty document 1 16:23:53.281 [main] INFO o.t.i.MultiDocumentFileCollection - TRECCollection processing /home/akhilesh/Desktop/terrier-core-4.1/share/hindi/akhilesh.trec 16:23:53.282 [main] WARN o.t.structures.indexing.Indexer - Adding empty document 2 16:23:53.282 [main] INFO o.t.structures.indexing.Indexer - Collection #0 took 0 seconds to index (2 documents) 16:23:53.307 [main] WARN o.t.s.indexing.LexiconBuilder - No temporary lexicons to merge, skipping 16:23:53.318 [main] INFO o.t.structures.indexing.Indexer - Started building the inverted index... 16:23:53.318 [main] ERROR o.t.structures.indexing.Indexer - Index has no terms. Inverted index creation aborted. Time elapsed: 0.1 seconds.

222112Akki commented 11 months ago

I am trying to index Devanagari (Non- English) dataset but the error is shown as above..... like ERROR o.t.structures.indexing.Indexer - Index has no terms. Inverted index creation aborted and it is not indexed. please provide me some solution how I can index the Non-English datasets in terrier 5.5 version. Thank you.