greyblake / whatlang-rs

Natural language detection library for Rust. Try demo online: https://whatlang.org/
https://whatlang.org/
MIT License
970 stars 109 forks source link

New API #5

Closed greyblake closed 7 years ago

greyblake commented 7 years ago

Here is the proposal for a new API. The main advantage of it is that it's easier to use, and the desired result can be obtained with one line without limitation to pass additional options (whitelist/blaclist).

let text = "Bla bla bla";
let whitelist = [Lang::Epo, Lang::Spa];

// get Option<Result>
let result = whatlang::new(text).detect();

// get only language, Option<Lang>
let lang = whatlang::new(text).detect_lang();

// detect only script, Option<Script>
let lang = whatlang::new(text).detect_script();

// with whitelist specified (same syntax for black list)
let result = whatlang::new(text).whitelist(&whitelist).detect()
greyblake commented 7 years ago

@Dr-Emann I'd like to know your opinion on this. Does it make sense for you?)

Dr-Emann commented 7 years ago

I think I like it, mostly. I'm not sure about whatlang::new(). functions named new are usually attached to a type, not as free functions. I'd expect something like WhatLang::new(). I'm also not sure about the Result name, in rust, Result in libraries is almost always just a type alias for a specific type of std::result::Result

You could even stick with the Query type, and do whatlang::Query::new(text).detect() (where users could pick to use Query).

I'm also somewhat drawn to an API closer to regexes, where you define the query (in this case, whitelists/blacklists), then you run that query over the text (so the query isn't tied to the search text at all. Something like this:

let text = "Bla bla bla";
let whitelist = [Lang::Epo, Lang::Spa];

let query = whatlang::Query::new();

// returns whatlang::Info, which contains script: Option<Script> and lang: Option<Lang>
let whatlang::Info { script, lang } = query.detect(text);

let query = query.whitelist(&whitelist);
let script = query.detect_script(text);

// Can also provide a shorthand free function, which is shorthand for Query::new().detect_*() without a white/blacklist
let lang_info = whatlang::detect(text);
let lang = whatlang::detect_lang(text);
greyblake commented 7 years ago

@Dr-Emann

I'm also not sure about the Result name

I had the same concern, but could not pick a good proper name for this. You suggestion with Info looks pretty good, thanks!

You could even stick with the Query type, and do whatlang::Query::new(text).detect()

I thought of this, it seems like ti would be easy to use, but I don't like that Query takes responsibility of detection.

Looking at your code example, it looks good, except the fact, that Query does, what it is not supposed to do. I got analogy in mind with http client (there is a client(query) and request(text)).

I think, what about introducing Detector, that would be configurable and perform detection for given text?


use whatlang;

let detector = Detector::new();
let info = detector.detect("Some text here").unwrap();

// alternative constructors with blacklist/whitelist
const BLACKLIST : &'static [Lang] = &[Lang::Spa, Lang::Fra];
let detector = Detector::with_blacklist(BLACKLIST)

// detecting language avoiding Info structure
let lang = detector.detect_lang("Another text").unwrap();
Dr-Emann commented 7 years ago

I like the name detector. I think it reads well, both on its own, and as whatlang::Detector.

I was going to complain about how to name a constructor that sets both a blacklist and whitelist (Detector::with_blacklist_whitelist()?), but then I realized that it really doesn't make sense to set both.

greyblake commented 7 years ago

with_ can be is a good name for an alternative constructor, as I larned from here

In any case Detector may have set_whitelist and set_blacklist methods.

Cool! Thanks for consulting)

greyblake commented 7 years ago

The new API is already implemented.

greyblake commented 7 years ago

The new API is already implemented.