Closed Irishx closed 6 years ago
Something is wrong with coverage computation, I changed it to temporarily output warnings rather than fail on it.
Should be fixed now.
nope!
serious error, the message seems to suggest that it erroneously takes the first word of the next line? "OK" after 'life?
(Babelente) c004749:DE irishendrickx$ python3 /Users/irishendrickx/Work/TraMOOC/Babelnet/babelente/babelente/babelente.py -t it -T DE.prep.de.sentences -S DE.prep.en.sentences --apikey "" > DE.prep.de.sentences.babelente.wiki.out
Extracting source entities...
--ERROR--
Line #69: In the midst of a blossoming tree of life
Got 'life', expected: 'life OK'
---
CHUNK #1 ENTITY #81. Ran query for firstlinenr=52, lastlinenr=129 text=[ David Attenborough ] How could one species turn into another ?
[ Richard Dawkins ] How is it that we find ourselves surrounded by such complexity , such elegance ?
[ Bill Nye ] The genes of you and me
They 're all made of DNA
We 're all made of the same chemicals
DNA - we 're all made of DNA
Natural Selection That is the key
[ Dawkins ] We are surrounded by endless forms Most beautiful , most wonderful
Evolution - the greatest show on Earth
There is grandeur in this view of life ( most beautiful , most wonderful )
Evolution - the greatest show on Earth
[ Attenborough ] The history of life can be thought of ss a many branched tree
The five kingdoms of life were established early on
Bacteria Protists - amoeba like creatures
Fungi , Plants , and Animals
In the midst of a blossoming tree of life
Perched on one tiny twig
In the midst of a blossoming tree of life
OK , time for a quick analysis session . This should n't take long ,
but hopefully turns on the light bulb in your head .
It 's pretty obvious we have a lot of social problems in the world .
The questions asked , however , regarding how to solve those issues ,
are nearly always the wrong questions .
So , let 's take a look at this and see if you notice a pattern .
Here is a simple chart .
We will list the Social Problem , and then go through the questions .
First , let 's take a look at world hunger .
Please , keep in mind , that the United States is part of the world ,
and we have hunger issues as well .
In fact , we have all of the issues I am going to list .
Nothing drives me more crazy than when people shout :
What about American problems ? as if we 're not all interconnected these days .
We do n't live in a bubble anymore people , wake up !
Anyway , do we have the technology to solve World Hunger ?
My video , ' Our Technical Reality ' , showcases this .
People love your website .
It has great articles , photos , and video .
But going through all that content to findwhat you 're looking for can take a long time .
So how do you make a great website even better ?
Simple : Google Custom Search .
[ VO ] It 's a Google search box on your sitethat delivers fast and relevant search results
letting visitors quickly and concisely searchyour website .
You can search one website or even a specifictopic across multiple websites .
Use it for free with ads , or try out the premiumservice .
For all types of websites , whether they ' rebig , medium or small .
Help your visitors find the information theyneed easier and faster
with Google Custom Search .
Coursera provides free access to world-class education
offered by the top universities .
Renowned university professors are working with Coursera
to make high-quality courses , in a wide range
of disciplines , available to people all around the world .
Classes offered on Coursera are designed
to help you master the material .
And you 'll be able to learn at your own pace
work on your own schedule ,
test your knowledge , and reinforce concepts
through interactive exercises .
Unlike traditional hour-long lectures ,
our courses break down complex ideas
into short video segments that are easy to digest .
... now , we 're going to talk about text processing ...
Each course provides lectures , homework assignments ,
and deadlines .
To help you stay on track , we 'll regularly check in ,
ask questions to make sure you understand
the material and feel comfortable moving forward .
Every year in April we can see something unusual in the streets of the Netherlands :
twelve-year-old kids on their bikes in numbered high visibility jackets .
They are taking a test .
And to reach that school most will ride their bicycles .
Which is clear from this school ’ s bicycle parking lot .
Up to 15 kilometers one way is no exception .
So they were taught the rules of conduct in traffic .
The tests are long tradition in the Netherlands .
These children were taking the test in 1935 .
Children are being taught about traffic from an early age .
For very young children traffic is included in their plays .
Entity: {'start': 867, 'end': 873, 'text': 'life OK', 'isEntity': True, 'tokenFragment': {'start': 171, 'end': 172}, 'charFragment': {'start': 867, 'end': 873}, 'babelSynsetID': 'bn:02765607n', 'DBpediaURL': 'http://dbpedia.org/resource/Life_OK', 'BabelNetURL': 'http://babelnet.org/rdf/s02765607n', 'score': 1.0, 'coherenceScore': 0.014204545454545454, 'globalScore': 5.330425621692871e-06, 'source': 'BABELFY'}
Offsetmap: {52: (0, 64), 53: (65, 165), 54: (166, 202), 55: (203, 227), 56: (228, 265), 57: (266, 294), 58: (295, 328), 59: (329, 407), 60: (408, 446), 61: (447, 521), 62: (522, 560), 63: (561, 639), 64: (640, 691), 65: (692, 733), 66: (734, 762), 67: (763, 804), 68: (805, 829), 69: (830, 871), 70: (872, 940), 71: (941, 993), 72: (994, 1062), 73: (1063, 1132), 74: (1133, 1172), 75: (1173, 1238), 76: (1239, 1263), 77: (1264, 1333), 78: (1334, 1378), 79: (1379, 1448), 80: (1449, 1484), 81: (1485, 1541), 82: (1542, 1595), 83: (1596, 1675), 84: (1676, 1729), 85: (1730, 1788), 86: (1789, 1844), 87: (1845, 1871), 88: (1872, 1916), 89: (1917, 2006), 90: (2007, 2055), 91: (2056, 2087), 92: (2088, 2179), 93: (2180, 2239), 94: (2240, 2317), 95: (2318, 2376), 96: (2377, 2445), 97: (2446, 2512), 98: (2513, 2540), 99: (2541, 2595), 100: (2596, 2629), 101: (2630, 2686), 102: (2687, 2733), 103: (2734, 2793), 104: (2794, 2834), 105: (2835, 2868), 106: (2869, 2914), 107: (2915, 2942), 108: (2943, 2987), 109: (2988, 3019), 110: (3020, 3059), 111: (3060, 3096), 112: (3097, 3148), 113: (3149, 3205), 114: (3206, 3260), 115: (3261, 3276), 116: (3277, 3332), 117: (3333, 3374), 118: (3375, 3425), 119: (3426, 3510), 120: (3511, 3584), 121: (3585, 3609), 122: (3610, 3666), 123: (3667, 3724), 124: (3725, 3770), 125: (3771, 3824), 126: (3825, 3874), 127: (3875, 3920), 128: (3921, 3980), 129: (3981, 4041)}
Traceback (most recent call last):
File "/Users/irishendrickx/Work/TraMOOC/Babelnet/babelente/babelente/babelente.py", line 337, in <module>
main()
File "/Users/irishendrickx/Work/TraMOOC/Babelnet/babelente/babelente/babelente.py", line 323, in main
sourceentities = [ entity for entity in findentities(sourcelines, args.sourcelang, args) if entity['isEntity'] and 'babelSynsetID' in entity ] #with sanity check
File "/Users/irishendrickx/Work/TraMOOC/Babelnet/babelente/babelente/babelente.py", line 323, in <listcomp>
sourceentities = [ entity for entity in findentities(sourcelines, args.sourcelang, args) if entity['isEntity'] and 'babelSynsetID' in entity ] #with sanity check
File "/Users/irishendrickx/Work/TraMOOC/Babelnet/babelente/babelente/babelente.py", line 106, in findentities
raise e
File "/Users/irishendrickx/Work/TraMOOC/Babelnet/babelente/babelente/babelente.py", line 100, in findentities
entity['linenr'], entity['offset'] = resolveoffset(offsetmap, entity['start'], lines, entity['text'])
File "/Users/irishendrickx/Work/TraMOOC/Babelnet/babelente/babelente/babelente.py", line 58, in resolveoffset
raise ValueError("Resolved offset does not match text " + str(offset) + "; minoffset=" + str(minoffset) + ", maxoffset=" + str(maxoffset) + ", lines=" + str(len(offsetmap)) )
ValueError: Resolved offset does not match text 37; minoffset=0, maxoffset=829, lines=78
It looks like BabelFy also detects entities that span multiple lines, which I hadn't expected but is a logical artefact of our chunking approach. I assume we don't need this behaviour since we have sentence fragments nicely put on lines? I can then make the system ignore such results.
Traceback (most recent call last): File "/Users/irishendrickx/Work/TraMOOC/Virtualenvs/Babelente/bin/babelente", line 11, in
load_entry_point('BabelEnte==0.1.2', 'console_scripts', 'babelente')()
File "/Users/irishendrickx/Work/TraMOOC/Virtualenvs/Babelente/lib/python3.6/site-packages/babelente/babelente.py", line 255, in main
evaluation = evaluate(sourceentities, targetentities, sourcelines, targetlines)
File "/Users/irishendrickx/Work/TraMOOC/Virtualenvs/Babelente/lib/python3.6/site-packages/babelente/babelente.py", line 167, in evaluate
coverage = compute_coverage_line(sourcelines[linenr], linenr, sourceentities)
File "/Users/irishendrickx/Work/TraMOOC/Virtualenvs/Babelente/lib/python3.6/site-packages/babelente/babelente.py", line 94, in compute_coverage_line
charmask[i] = 1
IndexError: index 103 is out of bounds for axis 0 with size 103