machinalis / iepy

Information Extraction in Python
BSD 3-Clause "New" or "Revised" License
906 stars 186 forks source link

Builtin relations #85

Open iScienceLuvr opened 9 years ago

iScienceLuvr commented 9 years ago

Are there builtin relations in iepy? if not, some should be added... I recommend using the relations from ConceptNet

rafacarrascosa commented 9 years ago

It could be a good thing! Perhaps as an example/kickstart.... Unfortunately we don't have the manpower right now to do it, but if you are willing to push something in this direction we have annotated corpora that could be used and we could guide you through it...

iScienceLuvr commented 9 years ago

@rafacarrascosa where's this corpora? plus can iepy be used on any sentence, even on question sentences?

paulhoule commented 9 years ago

You've got to remember that NLP systems compete with the old IR systems that use primitive tricks like tf*idf. Those work well enough for a range of problems and are easy to apply so that it takes a good amount of focused effort to get NLP systems to do better than bag-of-words.

If you want to make a system which is useful and "knocks their socks off" I think the thing to do is pick a small number of relations in some domain and create a focused corpus for that. If you spread the effort too thin you will get something that sucks like the Alchemy API.

Question sentences are no problem.

It would be cool to see a fully stacked up open source NLP system complete with trained models. One of the very few ones out there is

http://ctakes.apache.org/

iScienceLuvr commented 9 years ago

@paulhoule check out http://ProjetPP.github.io they have developed an OpenNLP system probably better than WolframAlpha in some respects.. they did this with Relation extraction, and their corpora wasnt very specific, it is open domain... I am looking into doing something like that

@rafacarrascosa i havent found any corpora that you mentioned.. what corpora do you use for testing? and is there an automatic wag for finding relations, or I have to define all of them?

iScienceLuvr commented 9 years ago

@rafacarrascosa have you seen my message? I want to certainly do this project when I have time

rafacarrascosa commented 9 years ago

@iScienceLuvr Yes! I saw them, unfortunately I'm overloaded with work and I want to give you a proper answer :S I'll get back to you on monday when things are quiet-ish again.

iScienceLuvr commented 9 years ago

@rafacarrascosa sure, can you send me a short answer now, and a longer one later?

iScienceLuvr commented 9 years ago

May I also ask, how does IEPY deal wtih pronoun?

rafacarrascosa commented 9 years ago

@iScienceLuvr WRT Corpora: The corpora is not public but last time I checked there was interest on doing something useful with that. We have tagged corpora for:

WRT Question sentences: Afaik there should be no problem, perhaps only a slight reduction in the NLP preprecessing quality.

WRT Automatic relation: There is a lot on that subject on the web, but nothing implemented into iepy. With automatic relations you end up having to disambiguate which relations mean the same thing.

WRT Pronouns: IEPY uses the pronoun resolution that comes bundled into Stanford's CoreNLP. So 'he', 'she' and so on are usually correctly identified as the referred entity.

iScienceLuvr commented 9 years ago

@rafacarrascosa so I can't get the corpora? I wanted some more features...

rafacarrascosa commented 9 years ago

@iScienceLuvr no, no, you got me wrong. The corpus is not public now, but perhaps it could be public if something good can come out of it (it depends on my bosses)... so, if you expand on what you would do, I could make an argument for the bosses

iScienceLuvr commented 9 years ago

@rafacarrascosa this is just a hobby, that is all

iScienceLuvr commented 8 years ago

@rafacarrascosa Hello again...it has been a long time...I just wanted to ask whether it is planned to make the corpora public? If there is any need for help for developing built-in relations, I could help, but I am pretty busy. I noticed I misunderstood your previous questions. I plan to add some basic relations from ConceptNet (if you have some specific ones in mind, please tell me)...I plan to train the corpora of course...

iScienceLuvr commented 8 years ago

@rafacarrascosa have you been able to see this message?

rafacarrascosa commented 8 years ago

Hi @iScienceLuvr , yes, I saw it but I was too busy until now. I've just taken your inquiry to my bosses to try and make a public release of the corpora we have... I'll let you know when I have an aswer.

iScienceLuvr commented 8 years ago

@rafacarrascosa thanks!

rafacarrascosa commented 8 years ago

@iScienceLuvr I have a partial answer: We at Machinalis want to make it public but since they are derived works from another corpora we have to check the licensing issues with more detail. Ie, we are willing to make them fully public but we have to check the original licenses to avoid copyright issues.

In the mean time, I have permission to share with you some corpora we have, as long as you do a fair use of them until we check the licenses.

If you send me an email at rcarrascosa@machinalis.com I'll give you the links privately.

Cheers!