Closed orzilca closed 6 years ago
hi Or! yep! i've just started-out a french version, but you can see i desperately need some help, as it really needs a detailed knowledge of the language, and I suck at it.
there's been some discussion about how much of nlp-comprmise can be re-used cross-language, and i think the answer is probably almost nothing. But a similar, rule-based attempt, with similar organisation & goals, is absolutely something we should do. Gimme a few days to stub-in a basic architecture for the french one, and we can look at what needs to be solved. I agree that French, German, and Spanish are probably the best to start. cheers
good and bad news :)
I was sure integrating new languages will be possible under the same code base..
hey @spencermountain
is it possible for the nlp to the detect if the language is english or not? could be helpful..
yep, would just be some kind of % coverage with the lexicon i guess? wanna do it?
I would try to implement Hebrew for this wonderful library. If it was ready for multi-language.
There are interfaces that are pretty much cross-language, and some that are not. Apostrophes, for example, are something distinctive to English alone. In Hebrew, there's a similar but different thing, where apostrophes and double-quotes are use to shorten words or multiple words. i.e:
So @spencermountain if you'll prepare the library for accepting more languages, I could help with Hebrew. We should also have a contributors file for people responsible for different languages.
Btw, In most languages in the world there's a concept of male/female and singular/plural for most nouns and adjectives. A car could be female in one language and male in another, and it could be described by the color "blue" with either a female or male stemming. The word "you" in English would have separate words for singular/plural/male/female representations in many languages.
ah, thanks daniel. Yeah, let me 'round the bases' with the english version first ;) I look forward to working with you
Okay let me know when you're ready :-)
@orzilca you could use this https://github.com/wooorm/franc
@spencermountain The french version of npl_comprise is still up to date ? Can i run some tests with this version ?
I'm french so i have a good knowledge of this language ^^
hey @D711 , no but a lot of the work on the v7 branch can be copied+pasted, once it's ready
I think making verb conjugation compatible with Latin languages (Italian, French, Spanish, etc) would help with some of the issues seen in the English conjugation of the present tense for some irregular verbs: be, do and possibly others...
digging through some issues, it looks like multi-language support is not yet included. has anyone seen a similar library that deals with german texts? i've seen issues related to that here, but it's from a deleted github user 😢
hey Peter no, unfortunately I haven't, though would love for one to exist. spacy is supposed to work well in german, I've heard. p.s. burn-notice is cool!
@spencermountain i came across an announcement for a german spacy version, but i would like to build something simple that does not need to talk to a remote server. compromise looked pretty neat in that regard!
btw, i've built burn-notice while i was in ottawa 👋
yeah, a big part of me wants to just start a german fork, and see if it picks-up some traction. I reckon it would.
I haven't done that yet just because the english version keeps changing so fast. That's seeming like a increasingly lame excuse.
I don't speak german, and got scared-off by gendered nouns - but given how frequent your situation is, kompromiss
really oughta get going.
oh, and those contractions. Oh geez. ;) happy to work on something with you
@spencermountain if you provide the skeleton to support multi-language stuff, i can fill in the gaps. i'm sure that we can get a 80% version rolling shortly.
hey, i got something started :boom: - https://github.com/nlp-compromise/de-compromise
if you clone that repo, do npm install
, then node scratch.js
you'll see
only the basic tokenization and tagging stuff is ported over. take a look around, there's no nlp-stuff working yet.
i've given you write-access, go crazy :de:
um, I speak zero-german. maybe i should mention that. :balloon:
got a workable demo of de-compromise, would love some help. gonna move this discussion over there
Hello, awesome script... Any update with Spanish feature?
@spencermountain ups, sorry for not circling back on you here. looks like i completely missed any notifications it back then 🙏
I'de to be able to nlp other languages others than english (for ex: spanish, german, french)
is it possible without changing the core source code?