scrapinghub / python-crfsuite

A python binding for crfsuite
MIT License
770 stars 222 forks source link

Does it support sentence labeling? #79

Open moushumimahato opened 6 years ago

moushumimahato commented 6 years ago

Hi,

I am trying crfsuite for sentence labeling (eg. whether a sentence is a QUESTION or GREETING or COMMAND etc.). Is it possible with this algorithm? If yes, what are the features required?

Thanks

kaushikacharya commented 6 years ago

@moushumimahato In case you have sequence of sentences: Have a look at the paper: Automatic classification of sentences to support Evidence Based Medicine by Su Nam Kim et al(2011) For medical abstracts, they are classifying sections into around 6 classes. Alongwith the features for each sentence they are also taking into consideration the sequence of sentences.

Read the section: Conditional random fields (on page number 4)

CRFs are undirected graphical models in which each vertex represents a random variable whose distribution is to be inferred, and each edge represents a dependency between two random variables. In our case the sentences in an abstract are represented by vertices, and the edges represent the relationship between sentences. CRFs have the advantage that they both model sequential effects and support the use of a large number of features; they have also been shown to perform comparatively well in other sentence-classification tasks [3, 4]

Also read the papers mentioned as reference number 3,4 from which they have taken ideas for features.