proycon / babelente

BabelEnte: Entity Extractor and Translator using BabelFy and Babelnet.org
4 stars 2 forks source link

Failure on coverage computation #7

Closed Irishx closed 6 years ago

Irishx commented 6 years ago

Traceback (most recent call last): File "/Users/irishendrickx/Work/TraMOOC/Virtualenvs/Babelente/bin/babelente", line 11, in load_entry_point('BabelEnte==0.1.2', 'console_scripts', 'babelente')() File "/Users/irishendrickx/Work/TraMOOC/Virtualenvs/Babelente/lib/python3.6/site-packages/babelente/babelente.py", line 255, in main evaluation = evaluate(sourceentities, targetentities, sourcelines, targetlines) File "/Users/irishendrickx/Work/TraMOOC/Virtualenvs/Babelente/lib/python3.6/site-packages/babelente/babelente.py", line 167, in evaluate coverage = compute_coverage_line(sourcelines[linenr], linenr, sourceentities) File "/Users/irishendrickx/Work/TraMOOC/Virtualenvs/Babelente/lib/python3.6/site-packages/babelente/babelente.py", line 94, in compute_coverage_line charmask[i] = 1 IndexError: index 103 is out of bounds for axis 0 with size 103

proycon commented 6 years ago

Something is wrong with coverage computation, I changed it to temporarily output warnings rather than fail on it.

proycon commented 6 years ago

Should be fixed now.

Irishx commented 6 years ago

nope!

serious error, the message seems to suggest that it erroneously takes the first word of the next line? "OK" after 'life?

(Babelente) c004749:DE irishendrickx$ python3 /Users/irishendrickx/Work/TraMOOC/Babelnet/babelente/babelente/babelente.py -t it -T DE.prep.de.sentences -S DE.prep.en.sentences --apikey "" > DE.prep.de.sentences.babelente.wiki.out
Extracting source entities...
--ERROR--
Line #69: In the midst of a blossoming tree of life
Got 'life', expected: 'life OK'
---
CHUNK #1 ENTITY #81. Ran query for firstlinenr=52, lastlinenr=129  text=[ David Attenborough ] How could one species turn into another ?
[ Richard Dawkins ] How is it that we find ourselves surrounded by such complexity , such elegance ?
[ Bill Nye ] The genes of you and me
They 're all made of DNA
We 're all made of the same chemicals
DNA - we 're all made of DNA
Natural Selection That is the key
[ Dawkins ] We are surrounded by endless forms Most beautiful , most wonderful
Evolution - the greatest show on Earth
There is grandeur in this view of life ( most beautiful , most wonderful )
Evolution - the greatest show on Earth
[ Attenborough ] The history of life can be thought of ss a many branched tree
The five kingdoms of life were established early on
Bacteria Protists - amoeba like creatures
Fungi , Plants , and Animals
In the midst of a blossoming tree of life
Perched on one tiny twig
In the midst of a blossoming tree of life
OK , time for a quick analysis session . This should n't take long ,
but hopefully turns on the light bulb in your head .
It 's pretty obvious we have a lot of social problems in the world .
The questions asked , however , regarding how to solve those issues ,
are nearly always the wrong questions .
So , let 's take a look at this and see if you notice a pattern .
Here is a simple chart .
We will list the Social Problem , and then go through the questions .
First , let 's take a look at world hunger .
Please , keep in mind , that the United States is part of the world ,
and we have hunger issues as well .
In fact , we have all of the issues I am going to list .
Nothing drives me more crazy than when people shout :
What about American problems ? as if we 're not all interconnected these days .
We do n't live in a bubble anymore people , wake up !
Anyway , do we have the technology to solve World Hunger ?
My video , ' Our Technical Reality ' , showcases this .
People love your website .
It has great articles , photos , and video .
But going through all that content to findwhat you 're looking for can take a long time .
So how do you make a great website even better ?
Simple : Google Custom Search .
[ VO ] It 's a Google search box on your sitethat delivers fast and relevant search results
letting visitors quickly and concisely searchyour website .
You can search one website or even a specifictopic across multiple websites .
Use it for free with ads , or try out the premiumservice .
For all types of websites , whether they ' rebig , medium or small .
Help your visitors find the information theyneed easier and faster
with Google Custom Search .
Coursera provides free access to world-class education
offered by the top universities .
Renowned university professors are working with Coursera
to make high-quality courses , in a wide range
of disciplines , available to people all around the world .
Classes offered on Coursera are designed
to help you master the material .
And you 'll be able to learn at your own pace
work on your own schedule ,
test your knowledge , and reinforce concepts
through interactive exercises .
Unlike traditional hour-long lectures ,
our courses break down complex ideas
into short video segments that are easy to digest .
... now , we 're going to talk about text processing ...
Each course provides lectures , homework assignments ,
and deadlines .
To help you stay on track , we 'll regularly check in ,
ask questions to make sure you understand
the material and feel comfortable moving forward .
Every year in April we can see something unusual in the streets of the Netherlands :
twelve-year-old kids on their bikes in numbered high visibility jackets .
They are taking a test .
And to reach that school most will ride their bicycles .
Which is clear from this school ’ s bicycle parking lot .
Up to 15 kilometers one way is no exception .
So they were taught the rules of conduct in traffic .
The tests are long tradition in the Netherlands .
These children were taking the test in 1935 .
Children are being taught about traffic from an early age .
For very young children traffic is included in their plays .
Entity: {'start': 867, 'end': 873, 'text': 'life OK', 'isEntity': True, 'tokenFragment': {'start': 171, 'end': 172}, 'charFragment': {'start': 867, 'end': 873}, 'babelSynsetID': 'bn:02765607n', 'DBpediaURL': 'http://dbpedia.org/resource/Life_OK', 'BabelNetURL': 'http://babelnet.org/rdf/s02765607n', 'score': 1.0, 'coherenceScore': 0.014204545454545454, 'globalScore': 5.330425621692871e-06, 'source': 'BABELFY'}
Offsetmap: {52: (0, 64), 53: (65, 165), 54: (166, 202), 55: (203, 227), 56: (228, 265), 57: (266, 294), 58: (295, 328), 59: (329, 407), 60: (408, 446), 61: (447, 521), 62: (522, 560), 63: (561, 639), 64: (640, 691), 65: (692, 733), 66: (734, 762), 67: (763, 804), 68: (805, 829), 69: (830, 871), 70: (872, 940), 71: (941, 993), 72: (994, 1062), 73: (1063, 1132), 74: (1133, 1172), 75: (1173, 1238), 76: (1239, 1263), 77: (1264, 1333), 78: (1334, 1378), 79: (1379, 1448), 80: (1449, 1484), 81: (1485, 1541), 82: (1542, 1595), 83: (1596, 1675), 84: (1676, 1729), 85: (1730, 1788), 86: (1789, 1844), 87: (1845, 1871), 88: (1872, 1916), 89: (1917, 2006), 90: (2007, 2055), 91: (2056, 2087), 92: (2088, 2179), 93: (2180, 2239), 94: (2240, 2317), 95: (2318, 2376), 96: (2377, 2445), 97: (2446, 2512), 98: (2513, 2540), 99: (2541, 2595), 100: (2596, 2629), 101: (2630, 2686), 102: (2687, 2733), 103: (2734, 2793), 104: (2794, 2834), 105: (2835, 2868), 106: (2869, 2914), 107: (2915, 2942), 108: (2943, 2987), 109: (2988, 3019), 110: (3020, 3059), 111: (3060, 3096), 112: (3097, 3148), 113: (3149, 3205), 114: (3206, 3260), 115: (3261, 3276), 116: (3277, 3332), 117: (3333, 3374), 118: (3375, 3425), 119: (3426, 3510), 120: (3511, 3584), 121: (3585, 3609), 122: (3610, 3666), 123: (3667, 3724), 124: (3725, 3770), 125: (3771, 3824), 126: (3825, 3874), 127: (3875, 3920), 128: (3921, 3980), 129: (3981, 4041)}
Traceback (most recent call last):
  File "/Users/irishendrickx/Work/TraMOOC/Babelnet/babelente/babelente/babelente.py", line 337, in <module>
    main()
  File "/Users/irishendrickx/Work/TraMOOC/Babelnet/babelente/babelente/babelente.py", line 323, in main
    sourceentities = [ entity for  entity in findentities(sourcelines, args.sourcelang, args) if entity['isEntity'] and 'babelSynsetID' in entity ] #with sanity check
  File "/Users/irishendrickx/Work/TraMOOC/Babelnet/babelente/babelente/babelente.py", line 323, in <listcomp>
    sourceentities = [ entity for  entity in findentities(sourcelines, args.sourcelang, args) if entity['isEntity'] and 'babelSynsetID' in entity ] #with sanity check
  File "/Users/irishendrickx/Work/TraMOOC/Babelnet/babelente/babelente/babelente.py", line 106, in findentities
    raise e
  File "/Users/irishendrickx/Work/TraMOOC/Babelnet/babelente/babelente/babelente.py", line 100, in findentities
    entity['linenr'], entity['offset'] = resolveoffset(offsetmap, entity['start'], lines, entity['text'])
  File "/Users/irishendrickx/Work/TraMOOC/Babelnet/babelente/babelente/babelente.py", line 58, in resolveoffset
    raise ValueError("Resolved offset does not match text " + str(offset) + "; minoffset=" + str(minoffset) + ", maxoffset=" + str(maxoffset) + ", lines=" + str(len(offsetmap)) )
ValueError: Resolved offset does not match text 37; minoffset=0, maxoffset=829, lines=78
proycon commented 6 years ago

It looks like BabelFy also detects entities that span multiple lines, which I hadn't expected but is a logical artefact of our chunking approach. I assume we don't need this behaviour since we have sentence fragments nicely put on lines? I can then make the system ignore such results.