spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.41k stars 654 forks source link

Is it possible to add more languages? #81

Closed orzilca closed 6 years ago

orzilca commented 8 years ago

I'de to be able to nlp other languages others than english (for ex: spanish, german, french)

is it possible without changing the core source code?

spencermountain commented 8 years ago

hi Or! yep! i've just started-out a french version, but you can see i desperately need some help, as it really needs a detailed knowledge of the language, and I suck at it.

there's been some discussion about how much of nlp-comprmise can be re-used cross-language, and i think the answer is probably almost nothing. But a similar, rule-based attempt, with similar organisation & goals, is absolutely something we should do. Gimme a few days to stub-in a basic architecture for the french one, and we can look at what needs to be solved. I agree that French, German, and Spanish are probably the best to start. cheers

orzilca commented 8 years ago

good and bad news :)

I was sure integrating new languages will be possible under the same code base..

orzilca commented 8 years ago

hey @spencermountain

is it possible for the nlp to the detect if the language is english or not? could be helpful..

spencermountain commented 8 years ago

yep, would just be some kind of % coverage with the lexicon i guess? wanna do it?

danielgindi commented 8 years ago

I would try to implement Hebrew for this wonderful library. If it was ready for multi-language.

There are interfaces that are pretty much cross-language, and some that are not. Apostrophes, for example, are something distinctive to English alone. In Hebrew, there's a similar but different thing, where apostrophes and double-quotes are use to shorten words or multiple words. i.e:

So @spencermountain if you'll prepare the library for accepting more languages, I could help with Hebrew. We should also have a contributors file for people responsible for different languages.

danielgindi commented 8 years ago

Btw, In most languages in the world there's a concept of male/female and singular/plural for most nouns and adjectives. A car could be female in one language and male in another, and it could be described by the color "blue" with either a female or male stemming. The word "you" in English would have separate words for singular/plural/male/female representations in many languages.

spencermountain commented 8 years ago

ah, thanks daniel. Yeah, let me 'round the bases' with the english version first ;) I look forward to working with you

danielgindi commented 8 years ago

Okay let me know when you're ready :-)

lynxaegon commented 8 years ago

@orzilca you could use this https://github.com/wooorm/franc

ghost commented 7 years ago

@spencermountain The french version of npl_comprise is still up to date ? Can i run some tests with this version ?

I'm french so i have a good knowledge of this language ^^

spencermountain commented 7 years ago

hey @D711 , no but a lot of the work on the v7 branch can be copied+pasted, once it's ready

leoseccia commented 7 years ago

I think making verb conjugation compatible with Latin languages (Italian, French, Spanish, etc) would help with some of the issues seen in the English conjugation of the present tense for some irregular verbs: be, do and possibly others...

phoet commented 7 years ago

digging through some issues, it looks like multi-language support is not yet included. has anyone seen a similar library that deals with german texts? i've seen issues related to that here, but it's from a deleted github user 😢

spencermountain commented 7 years ago

hey Peter no, unfortunately I haven't, though would love for one to exist. spacy is supposed to work well in german, I've heard. p.s. burn-notice is cool!

phoet commented 7 years ago

@spencermountain i came across an announcement for a german spacy version, but i would like to build something simple that does not need to talk to a remote server. compromise looked pretty neat in that regard!

btw, i've built burn-notice while i was in ottawa 👋

spencermountain commented 7 years ago

yeah, a big part of me wants to just start a german fork, and see if it picks-up some traction. I reckon it would. I haven't done that yet just because the english version keeps changing so fast. That's seeming like a increasingly lame excuse. I don't speak german, and got scared-off by gendered nouns - but given how frequent your situation is, kompromiss really oughta get going.

spencermountain commented 7 years ago

oh, and those contractions. Oh geez. ;) happy to work on something with you

phoet commented 7 years ago

@spencermountain if you provide the skeleton to support multi-language stuff, i can fill in the gaps. i'm sure that we can get a 80% version rolling shortly.

spencermountain commented 7 years ago

hey, i got something started :boom: - https://github.com/nlp-compromise/de-compromise if you clone that repo, do npm install, then node scratch.js you'll see image only the basic tokenization and tagging stuff is ported over. take a look around, there's no nlp-stuff working yet.

i've given you write-access, go crazy :de:

spencermountain commented 7 years ago

um, I speak zero-german. maybe i should mention that. :balloon:

spencermountain commented 6 years ago

got a workable demo of de-compromise, would love some help. gonna move this discussion over there

inglesuniversal commented 1 year ago

Hello, awesome script... Any update with Spanish feature?

phoet commented 1 year ago

@spencermountain ups, sorry for not circling back on you here. looks like i completely missed any notifications it back then 🙏