nert-nlp / cgel

CGEL trees.
Creative Commons Attribution 4.0 International
6 stars 3 forks source link

Typos in all-examples/pagified #109

Closed danflick closed 2 months ago

danflick commented 4 months ago

While parsing the examples in all-examples/pagified using the English Resource Grammar, I found a number of example sentences in that file where the text does not match what is in the CGEL book:

(1) typo on p. #206| iii These ideas are to be found throught his later work. "throught" => "throughout" (2) typo on p. #320| ii a. She praised him for him sincerity. b. She praised his sincerity. "for him sincerity." => "for his sincerity." (3) typo on p. #446| [13] a man my age, shoes this size, the results last year, houses this side of the lake [Missing example number "i" before "a man my age, ..." (4) missing line present in book [note corrected book typo "hours" => "hour"]:

446| [13] ii fifty miles an hour, a salary of [$20,000 a year], ten dollars a head

(5) typo on p. #561| iii The company has faced a serious of major, major setbacks. "serious" => "series" (6) typo on p. #693| iv Her philosphy mark was 70%. "philosphy" => "philosophy" (7) typo on p. #693| ii a. This case is over 20 kilos. b. This case weighs over $20. "weighs over $20." => "weighs over 20 kilos." (8) typo on p. #695| b. The Company was founded on 1 Jancuary, 1978. "Jancuary" => "January" (9) typo on p. #750| ii a. I'm going to the party, even if Kim b. I'd be going to the party, even if second occurrence of "even if" => "even if Kim" (10) typo on p. #814| b. ?As far as I can recall, I have purchased food at the drive-through window of afast-food restaurant on no street in this city. "afast-food" => "a fast-food" (11) typo on p. #1044| [26] i They refuse to support the UN's expenses of maintaining the UN Emergency Force in the Middles East as a buffer between Egypt and Israel, the UN troops in the Congo, [which expenses are not covered by the regular budget]. "Middles East" => "Middle East" (12) typo on p. #1083| iv Whoi have they shortlisted __i in addition to Kim. ) "in addition to Kim." => "in addition to Kim?" (13) typo on p. #1134| iv He went so/as far as to compare the proposal to a tax on shunshine. "shunshine" => "sunshine" (14) unwanted added spaces separating single quotes around "culture" in

1256| [22] i To discuss melodrama, then, is to raise questions about ' culture ' itself and the categories and oppositions by which we conceptualise it.

 ' culture ' => 'culture' [note that the mark preceding "culture" is really a left single quote, unloved by Github]

(15) missing word "to" before "try" on p. #1271| [10] Jilli intends [i try [i to mediate between them]]. "intends [i try" => "intends [i to try" (16) missing hyphen on p. #1356| [17] i A recent newspaper report ... about onehalf of one per cent ..." "onehalf" => "one-half" (17) missing hyphen on p. #1356| ii Professionally a lawyer, that is to say associated with dignity, reserve, discipline, with much that is essentially middleclass, he is compelled by an impossible love to exhibit himself dressed up, disguised – that is, paradoxically, revealed – as a child, and, worse, as a whore masquerading as a child. "middleclass" => "middle-class" (18) typo on p. #1375| ii The ratings no doubt will show that some small number of Americans failed to escape and ended up watching the twohour NBC 'World Premiere Movie'. [A premiere it may be, but new it's not.] "twohour" => "two-hour" (19) two typos on p. #1375| [8] i 'In the early days, our productions were cheap and cheerful,' says producer John Weaver of Londonbased Keefco. 'We'd go into a sevenlight studio, shoot the band in one afternoon and edit as we went along. The client would walk out with a tape that day.' [Today's tapes may still be cheerful, but cheap they are not.] "Londonbased" => "London-based" "sevenlight" => "seven-light" (20) deleted space on p. #1376| iii In the VIP section of the commissary at 20th CenturyFox, the studio ... "20th CenturyFox" => "20th Century Fox" (21) typo on p. #1385| [1] i George, can you do me a favour? [Up in my room, on the nightstand, is a pinkishreddish envelope that has to go out immediately.] "pinkishreddish" => "pinkish-reddish" (22) typo on p. #1385| iv Arrested were Nathan Johnson, 23, of New York, and his brother, Victor Johnson, 32, a 15year Army veteran. "15year" => "15-year" (23) typo on p. #1400| [31] i In addition to interestrate risk, there is the added risk that when interest rates fall, mortgages will be prepaid, thereby reducing the Portfolio's future income stream. "interestrate" => "interest-rate" (24) typo on p. #1442| ii He saw Kim get/be mauled by my brothers dog. "brothers dog" => "brother's dog" (25) typo on p. #1459| iii Ed always get his kids to help in the kitchen: why don't you __? "get" => "gets" (26) typo on p. #1474| [35] The brother who left his estate to charity will be remembered longer than the one who left it sent to his children. "left it sent to" => "left it to" (27) missing line after #1478| iv a. [Without the support of her mother,] b. [Without the support of Ann's mother,] Ann would not have survived. she would not have survived. (28) typo on p. #1483| iii Let's get on with it. ... Now your're in for it! "your're" => "you're" (29) extra left quote on p. #1546| [7] i Stories of 'Renamo 'slave camps' compete with those of Frelimo 'work camps'. Refugees fear both armies. But [such] is the nature of war in Africa. "of 'Renamo 'slave" => "of Renamo 'slave" [remove first left quote, and note that the one before "slave" is really a left quote]

nschneid commented 4 months ago

Thanks Dan, I will look into these. However, you should NOT have to parse the pagified file—a much cleaner file is https://github.com/nert-nlp/cgel/blob/main/all-examples/cge01-17Ex.yaml

nschneid commented 4 months ago

(documentation at https://github.com/nert-nlp/cgel/blob/main/all-examples/README.md)

danflick commented 4 months ago

Hi Nathan,

The .yaml file is indeed cleaner and much easier to work with, but alas it is missing a lot of examples from the book that are more faithfully preserved in the 'pagified' file, which is what led me to work directly with that file. (I was also sad to see that the .yaml file systematically replaced the left quote mark used in the book with a straight quote mark throughout.) Below are some examples of mismatches between the .yaml file and the book; comments on the first 500 pages are meant to be exhaustive, while the remainder are just illustrative but not complete. I hope these are helpful.

Dan

p.23 ex. 5 missing phrasal examples p.53 ex. 2 missing in yaml p.107 ex. 46i "tomorrow,you" => "tomorrow, you" p.112 ex. 58ia [missing two of the five sentences in the group] "Have I enough tea?" "I have got enough tea." p.112 ex. 58iia [missing two of the five sentences in the group] "I haven't to read it all." "Have I to read it all?" p.130 ex. 16 [missing two of the three sentences in the group] "1435: Congress of Arras: Burgundians withdraw support from England, in favour of France." "1436: Albert I becomes Emperor – the first Habsburg Emperor." p.238 ex. 5ii [missing this example] "What did you buy?" p.246 ex. 2 [missing these examples] "Pat overlooked the error." "The error was overlooked (by Pat)." p.257 ex. 14i [missing four of the six examples] "He went mad." "He went to hospital." "She stayed calm." "She stayed inside." p.257 ex. 14ii [missing four of the six examples] "They got me angry." "They got me to the shore." "They left me unmoved." "They left me in the waiting-room." p.258 ex. 15i [missing four of the six examples] "Kim seemed angry." "Kim seemed at the back of the queue." "They sounded strange." "They sounded in a cave." p.258 ex. 15ii [missing four of the six examples] "She made him happy." "She made him onto the platform." "This rendered it useless." "This rendered it in the wastebin." p.265 ex. 41 [missing two of the four examples] "They made him anxious/treasurer." "They created her a life peer." p.277 ex. 16 [missing the useful sentences in this example] p.286 ex. 44 [missing the useful sentences in this example] p.296 ex. 1 [missing the useful sentences in this example] p.297 ex. 2 [missing the useful sentences in this example] p.297 ex. 3 [missing the useful sentences in this example] p.300 ex. 15e [missing this sentence] p.338 ex. 15 [missing the useful sentences in this example] p.349 ex. 35ii [missing this sentence] p.359 ex. 5iib' [missing "He hadn't eaten any of the meat."] p.360 ex. 7iib' [missing "He hadn't eaten any of the pies."] p.360 ex. 8iib' [missing "He hadn't eaten either of the pies."] p.408 ex. 18 [missing two of the four examples] "He was going at 50 miles an hour." "It costs $20 a yard/person." p.467 ex. 41iii "... as good as Kim's ." => "...as good as Kim's." [extra space] p.499 ex. 50 [missing these three examples]

p.521 ex. 6i "fiancee." => "fiancée." p.575 ex. 4ii [items b-c are not like b-e in the book] p.775 ex. 1 "... with a Ph. D. ..." => "... with a Ph.D. ..." [extra space] p.776 ex. 5i "... without a Ph. D. ..." => "... without a Ph.D. ..." p.776 ex. 5ii "... her Ph. D. ..." => "... her Ph.D. ..." p.834 ex. 29iia Why did you help him? someone like George? => Why did you help someone like George? [also in file "pagified", errors:

834| ii a. Why did you help him? b. Why would you lift a finger to help

  "Why did you help him?" => "Why did you help"

@834| someone like George? => [should be two copies to complete preceding line] someone like George? someone like George? p.1066 ex. 21i "dreadfullyii." => "dreadfully." p.1101 ex. 5iib "Cezanne." => "Cézanne." [accent] p.1102 ex. 7ii "Cezanne" => "Cézanne" p.1144 ex. 21iii "pre- marital" => "pre-marital" p.1457 ex. 17 "Saute it" => "Sauté it" [accent] p.1476 ex. 39ii [Missing the last three paragraphs of this example] It is remarkable that Bland reached the North Magnetic Pole only 12 months after major heart surgery. This was in February 1998 when he and four British men pulled sledges across the frozen Arctic Sea for 650 kilometres. Two years before, he and others had sailed to the South Magnetic Pole in an 18 metre sloop. During this trip he risked his life by diving overboard into the icy waters to cut a line free from the yacht's propeller. On his return, Bland began to plan a visit to the North Magnetic Pole.

nschneid commented 4 months ago

Thanks for itemizing these issues. I’ve fixed the first batch: these were due to errors in the .docx files, including a widespread problem in Ch. 16 where an unusual character (U+2010) was substituted for the standard hyphen. Additional instances beyond the ones you listed: p. 1373 co-operative p. 1375 off-the-wall p. 1376 passers-by p. 1380 collective-bargaining p. 1388 low-income, high-rises p. 1389 fifty-foot

nschneid commented 4 months ago

Accent issues: fixed.

Spacing issues:

p.107 ex. 46i "tomorrow,you" => "tomorrow, you" p.467 ex. 41iii "... as good as Kim's ." => "...as good as Kim's." [extra space] p.775 ex. 1 "... with a Ph. D. ..." => "... with a Ph.D. ..." [extra space] p.776 ex. 5i "... without a Ph. D. ..." => "... without a Ph.D. ..." p.776 ex. 5ii "... her Ph. D. ..." => "... her Ph.D. ..." p.834 ex. 29iia <em>Why did you help him? someone like George?</em> => <em>Why did you help someone like George?</em> [also in file "pagified", errors:

834| ii a. Why did you help him? b. Why would you lift a finger to help

  "Why did you help him?" => "Why did you help"

@834 someone like George? => [should be two copies to complete preceding line] someone like George? someone like George? p.1066 ex. 21i "dreadfullyii." => "dreadfully." p.1144 ex. 21iii "pre- marital" => "pre-marital"

Fixed all of these.

TODO: Missing content issues:

p.112 ex. 58ia [missing two of the five sentences in the group] "Have I enough tea?" "I have got enough tea." p.112 ex. 58iia [missing two of the five sentences in the group] "I haven't to read it all." "Have I to read it all?" p.130 ex. 16 [missing two of the three sentences in the group] "1435: Congress of Arras: Burgundians withdraw support from England, in favour of France." "1436: Albert I becomes Emperor – the first Habsburg Emperor." p.238 ex. 5ii [missing this example] "What did you buy?" p.246 ex. 2 [missing these examples] "Pat overlooked the error." "The error was overlooked (by Pat)." p.257 ex. 14i [missing four of the six examples] "He went mad." "He went to hospital." "She stayed calm." "She stayed inside." p.257 ex. 14ii [missing four of the six examples] "They got me angry." "They got me to the shore." "They left me unmoved." "They left me in the waiting-room." p.258 ex. 15i [missing four of the six examples] "Kim seemed angry." "Kim seemed at the back of the queue." "They sounded strange." "They sounded in a cave." p.258 ex. 15ii [missing four of the six examples] "She made him happy." "She made him onto the platform." "This rendered it useless." "This rendered it in the wastebin." p.265 ex. 41 [missing two of the four examples] "They made him anxious/treasurer." "They created her a life peer." p.277 ex. 16 [missing the useful sentences in this example] p.286 ex. 44 [missing the useful sentences in this example] p.296 ex. 1 [missing the useful sentences in this example] p.297 ex. 2 [missing the useful sentences in this example] p.297 ex. 3 [missing the useful sentences in this example] p.300 ex. 15e [missing this sentence] p.338 ex. 15 [missing the useful sentences in this example] p.349 ex. 35ii [missing this sentence] p.359 ex. 5iib' [missing "He hadn't eaten any of the meat."] p.360 ex. 7iib' [missing "He hadn't eaten any of the pies."] p.360 ex. 8iib' [missing "He hadn't eaten either of the pies."] p.408 ex. 18 [missing two of the four examples] "He was going at 50 miles an hour." "It costs $20 a yard/person." p.499 ex. 50 [missing these three examples] p.575 ex. 4ii [items b-c are not like b-e in the book]

nschneid commented 4 months ago

Fixed the easy cases. Some of the missing sentences are due to limitations in processing special layouts. Will need to work more on the scripts.

danflick commented 4 months ago

Thanks for the quick improvements. I found another pair of similar errors in pagified' andcge01-17Ex.yaml', likely the last additional ones I'll bother you with, since I have now sent all of the sentences through the English Resource Grammar and treebanked the 95% that the grammar can analyze. I also extracted the numbered examples from your cge18Ex.docx and cge19-20Ex.docx, since I'm also interested in derivational lexical rules and in punctuation, so the short list below includes a few errors from those files.

Additional errors in 'cge01-17Ex.yaml' and in 'pagified' p.464 ex. 27i lowercase "junko" should be uppercase "Junko" p.916 ex. 46iii b. [What exactly] do you mean?.
[delete the spurious final period, not in the book]

Errors in cge19-20Ex.docx Ch.20 p.29 ex.83ii "raise again" => "rise again" Ch.20 p.33 ex.5i "did not want have to" => "did not want to have to" Ch.20 p.41 ex.4iii "Indianopolis" => "Indianapolis

Maybe of interest, I also found the following five apparent errors in examples in the book itself, which I've reported to Geoffrey Pullum, who may consider them for his Errata web page (http://www.lel.ed.ac.uk/~gpullum/cgelerrata.html):

(a) missing word on p.1017 ex.18.1.b
The bank manager is be congratulated on this initiative. "is be congratulated" => "is to be congratulated" (b) typo on p.1359 ex.28.i The tourists, most of them foreigners, had been hoarded onto a cattle truck. "hoarded" => "herded" (c) missing word on p.1556 ex.21.i They moved from [the old Treasury Buildings] to Government House ballroom across the road; American military authorities were in [the former place]. "to Government House ballroom" => "to the Government House ballroom" (d) typo on p.1563 ex.15.ii She lost her job soon after her father died. She was still in her fifities at this time, much too young for retirement. "in her fifities" => "in her fifties" (e) missing word on p.1736 ex.4.ii.b He told the press his reason: he did not want have to renegotiate his contract, but he did not give any explanation to the team owners. "... did not want have to ..." => "... did not want to have to ..."

nschneid commented 4 months ago

Regarding (c), I found the source text, part of the ACE corpus:

Back in Western Australia I called at the Education Department to argue my way free from the teaching profession. The Department had moved from the old Treasury Buildings to Government House Ballroom across the Terrace; American military authorities were in the former place.

So maybe "Ballroom" should be capitalized instead of adding "the"?

danflick commented 4 months ago

Good find. I'll be interested to see what Pullum says, but restoring the capitalization as it was in the original text would be simplest, yes.

nschneid commented 3 months ago

Progress update:

danflick commented 3 months ago

I reported this example to Geoffrey Pullum, who elected on the CGEL Errata web page (http://www.lel.ed.ac.uk/~gpullum/cgelerrata.html) to correct the example by adding the article "the" rather than restoring the capitalization of "ballroom".

nschneid commented 2 months ago

I believe all these examples are now fixed. Thanks @danflick for the thorough report!

danflick commented 2 months ago

Great work, thanks, Nathan.

Dan

On Sun, Jul 21, 2024 at 10:41 PM Nathan Schneider @.***> wrote:

I believe all these examples are now fixed. Thanks @danflick https://github.com/danflick for the thorough report!

— Reply to this email directly, view it on GitHub https://github.com/nert-nlp/cgel/issues/109#issuecomment-2241982275, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG5PC4TDLVFGVWJY6BQUNJLZNR5NJAVCNFSM6AAAAABINQE2I6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBRHE4DEMRXGU . You are receiving this because you were mentioned.Message ID: @.***>