zhongkaifu / RNNSharp

RNNSharp is a toolkit of deep recurrent neural network which is widely used for many different kinds of tasks, such as sequence labeling, sequence-to-sequence and so on. It's written by C# language and based on .NET framework 4.6 or above versions. RNNSharp supports many different types of networks, such as forward and bi-directional network, sequence-to-sequence network, and different types of layers, such as LSTM, Softmax, sampled Softmax and others.
BSD 3-Clause "New" or "Revised" License
285 stars 91 forks source link

Gazetteer list as a feature #24

Open My-Khan opened 8 years ago

My-Khan commented 8 years ago

Though RNN support the word embedding feature which is very plus point of RNN compared to the competitor CRF. is RNN have the capability to support external Gazetteer list and dictionaries as feature?

zhongkaifu commented 8 years ago

Yes. RNNSharp supports it. It's called TFeature (template feature). README file describes how it work and how to use it in details.

My-Khan commented 8 years ago

Thanks for prompt reply, i read the mentioned source but i am still confused. Actually besides the training data i have a separate text file which contain Countries name stored in text file. so how this separate file can be used as feature along with training data for learning through RNNShrap. e.g my training file named "mytrain" contains data in following format. کو PSP NOR بھارت PNN S_LOCATION سے PSP NOR تعلق NN NOR رکھنے VBI NOR

The gazetteer list name " MyConList" contains data in the following format. 1: PAKISTAN 2: INDIA 3: CHINA 4: USA

my template file contains the following templates U01:%x[-1,0] U02:%x[0,0]

so during training the mention template will generate features from only the training file named "mytrain" so please guide that how to use the separate file or the gazetteer list named "MyConList" in training of RNN. Thanks in advance

zhongkaifu commented 8 years ago

You could read [Template Features] section in README file. It has an example about how to use template features. In RNNSharp, template features are binarized by TFatureBin.exe, and then RNNSharp uses it.

My-Khan commented 8 years ago

apology in advance. still not clear, perhaps i am not explaining my problem well. i have no problem with template feature i can generate it easily by using the TFeatureBin.exe . following are the steps which i follow: For template feature generation from the following data stored in file named "mytrain.txt" i use TFatureBin.exe build mode.

! PUN S Tokyo NNP S_LOCATION and CC S New NNP B_LOCATION York NNP E_LOCATION are VBP S major JJ S

After executing the Tfeature.exe it generates Two files named tfeature.template and tfeature. right i mention the output file in config file to be used by RNNShrap e.g TFEATURE_FILENAME:tfeatures. ok its work well. in above steps i used only one file named "myTrain.txt" to generate template feature, in case if i have another file or gazetteer named " "myConList.txt" contains data in the following format. then how template feature will be generated from both files using TFeature.exe 1: PAKISTAN 2: INDIA 3: CHINA 4: USA

bratao commented 8 years ago

@My-Khan , You need to create a script yourself for injecting this kind of feature in your "mytrain.txt"

My-Khan commented 8 years ago

@bratao Many thanks for guidance..Hmmmm , now this become nutshell for me. if some body can help?