Closed SindhuBairavi closed 7 years ago
Hi there.
The csv contains EvidenceCandidates ids. Each EvidenceCandidate contains: a text segment, a left entity occurrence, a right entity occurrence and a relation.
For visualizing everything together you can use the TerminalEvidenceFormatter (available at here from iepy.extraction.terminal import TerminalEvidenceFormatter)
You can see an example of such formatter in use here https://github.com/machinalis/iepy/blob/develop/iepy/instantiation/rules_verifier.py
If I don't want to use the UI, but want to just generate the list of sentences/segments which the rules have marked as evidence candidates, then how to fetch the segments?
Here I go again.
As I tried to explain before, a iepy runner will return CandidateEvidences, which is a piece of text with some important pieces highlighted (the EntityOcurrences). In some cases, the same piece of text may be part of several different CandidateEvidences. Example, consider the following text:
"Peter was born in 1916, he married Anna in 1930, and died in 1950"
If you have a relation "Person" - "Date", you would have the following 6 CandidateEvidences:
That's why I still insist that you may need not only the "sentence or text" but all the information.
Moreover, the TerminalEvidenceFormatter I mentioned before it's a tool that prints in standard output a piece of text, highlighting with different colors the correspondent entity occurrences. If you dont want exactly that, you could adapt to your needs this piece of code: function "colored_text" here https://github.com/machinalis/iepy/blob/develop/iepy/extraction/terminal.py#L141:L166
Hope it helps
On Thu, Dec 8, 2016 at 7:50 AM, Sindhu Bairavi notifications@github.com wrote:
If I don't want to use the UI, but want to just generate the list of sentences/segments which the rules have marked as evidence candidates, then how to fetch the segments?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/machinalis/iepy/issues/115#issuecomment-265711431, or mute the thread https://github.com/notifications/unsubscribe-auth/AAd04yyqRUj4Bty_aaGIEWh0n6-imhq_ks5rF-DigaJpZM4K5QPE .
-- Javier Mansilla - Technical Leader www.machinalis.com
When I use rules to fetch evidence candidates, the output only contains segment id and true/false. How do I fetch the segment text from the content? I understand that I can use the offsets, but these offsets are not character level, they seem to be on tokens. Which work tokeniser/segmenter is used to replicate the same? Else the number may not match.
Any help will be appreciated! Thanks!