mike820324 / ComicSearchEngine

A simple node implementation that help search the comic you want.
MIT License
0 stars 0 forks source link

new indexer design #4

Open mike820324 opened 9 years ago

mike820324 commented 9 years ago

currently the indexer design is not that good, and require many copy and paste when adding new comic website. I really should redesign the indexer.

mike820324 commented 9 years ago

First thought is,

key: $uuid value: { site: $url name: comicName description: comicDescription link: comicLInk }

and according to name and description, build inverted index. In order to build a inverted index, I need, chinese segmentation. And a leveldb update method.

mike820324 commented 9 years ago

Index-search module is a pretty well tested library. But the problem is that this module is using natural to generate the keywords and some other classifier algorithms, unfortuneately natural currently doesn't support Chinese. ~~