messense / crfs-rs

Pure Rust port of CRFsuite: a fast implementation of Conditional Random Fields (CRFs)
MIT License
28 stars 11 forks source link

Wildly differing results compared to crfsuite-rs #16

Open decahedron1 opened 1 year ago

decahedron1 commented 1 year ago

I'm using crfs for part-of-speech tagging and I noticed the output is very different compared to crfsuite-rs.

With the input The quick brown fox jumped over the lazy dog:

crfsuite-rs: DT JJ NN RB VBN IN DT JJ NN
crfs: DT JJ NN JJ NN IN DT JJ NN

Tagging code using crfs Model: pos.crf Test script:

use std::fs;

use crfs::Model;
use pos_test::PartOfSpeechTagger;

fn main() {
    let model = fs::read("./pos.crf").unwrap();
    let model = Model::new(&model).unwrap();
    let mut tagger = PartOfSpeechTagger::new(&model, Default::default()).unwrap();
    let sentence = "The quick brown fox jumped over the lazy dog.";
    let results = tagger.tag(&sentence.split_ascii_whitespace().collect::<Vec<_>>()).unwrap();
    for part in results {
        print!("{part} ");
    }
    println!();
}