websterParser / WebsterParser

Convert Webster's Unabridged 1913 dictionary in to a more usable format
GNU General Public License v3.0
349 stars 20 forks source link

Some lines in definitions get lost in the output files #29

Closed dpcpnry closed 2 years ago

dpcpnry commented 3 years ago

Edit: After many hours of of trial and error on my phone with Termux (I don't have a computer with me these days), I managed to convert this new format to Stardict. See the next comment.

During the process, I noted the following issue.

It seems that some lines in definitions get lost in the output files (for example: in template/dict.xml file).

For example, with the word "happy", in srcFiles/CIDE.H, its original entry is:

<p><ent>Happy</ent><br/
<hw>Hap"py</hw> <pr>(h<acr/p"p<ycr/)</pr>, <pos>a.</pos> <amorph>[<pos>Compar.</pos> <adjf>Happier</adjf> <pr>(-p<icr/*<etil/r)</pr>; <pos>superl.</pos> <adjf>Happiest</adjf>.]</amorph> <ety>[From <er>Hap</er> chance.]</ety> <sn>1.</sn> <def>Favored by hap, luck, or fortune; lucky; fortunate; successful; prosperous; satisfying desire; <as>as, a <ex>happy</ex> expedient; a <ex>happy</ex> effort; a <ex>happy</ex> venture; a <ex>happy</ex> omen.</as></def><br/
[<source>1913 Webster</source>]</p>

<p><q>Chymists have been more <qex>happy</qex> in finding experiments than the causes of them.</q> <rj><qau>Boyle.</qau></rj><br/
[<source>1913 Webster</source>]</p>

<p><sn>2.</sn> <def>Experiencing the effect of favorable fortune; having the feeling arising from the consciousness of well-being or of enjoyment; enjoying good of any kind, as peace, tranquillity, comfort; contented; joyous; <as>as, <ex>happy</ex> hours, <ex>happy</ex> thoughts</as>.</def><br/
[<source>1913 Webster</source>]</p>

<p><q><qex>Happy</qex> is that people, whose God is the Lord.</q> <rj><qau>Ps. cxliv. 15.</qau></rj><br/
[<source>1913 Webster</source>]</p>

<p><q>The learned is <qex>happy</qex> Nature to explore,<br/
The fool is <qex>happy</qex> that he knows no more.</q> <rj><qau>Pope.</qau></rj><br/
[<source>1913 Webster</source>]</p>

<p><sn>3.</sn> <def>Dexterous; ready; apt; felicitous.</def><br/
[<source>1913 Webster</source>]</p>

<p><q>One gentleman is <qex>happy</qex> at a reply, another excels in a in a rejoinder.</q> <rj><qau>Swift.</qau></rj><br/
[<source>1913 Webster</source>]</p>

<p><cs><col><b>Happy family</b></col>, <cd>a collection of animals of different and hostile propensities living peaceably together in one cage. Used ironically of conventional alliances of persons who are in fact mutually repugnant.</cd> -- <col><b>Happy-go-lucky</b></col>, <cd>trusting to hap or luck; improvident; easy-going.</cd> <ldquo/<xex>Happy-go-lucky</xex> carelessness.<rdquo/  <rj><au>W. Black.</au></rj></cs><br/
[<source>1913 Webster</source>]</p>

It is missing the following lines, given that they all have the same [1913 Webster]:

<p><q>One gentleman is <qex>happy</qex> at a reply, another excels in a in a rejoinder.</q> <rj><qau>Swift.</qau></rj><br/
[<source>1913 Webster</source>]</p>

<p><cs><col><b>Happy family</b></col>, <cd>a collection of animals of different and hostile propensities living peaceably together in one cage. Used ironically of conventional alliances of persons who are in fact mutually repugnant.</cd> -- <col><b>Happy-go-lucky</b></col>, <cd>trusting to hap or luck; improvident; easy-going.</cd> <ldquo/<xex>Happy-go-lucky</xex> carelessness.<rdquo/  <rj><au>W. Black.</au></rj></cs><br/
[<source>1913 Webster</source>]</p>

See the attached picture to compared with an old format Stardict dictionary data, from Jsomers link

Websters1913-new-format2

Websters1913-new-format

dpcpnry commented 3 years ago

Code and steps to create the Stardict dictionary data above:

https://gist.github.com/dpcpnry/df8b0722b0274aa999d01328c893fe38

nickwynja commented 2 years ago

I've spent some time discovering the source of this problem. I don't have a fix yet but I believe these definitions are getting dropped in the process of filtering out non-Webster definitions. Even though these definitions have the correct source, they are getting dropped. You can confirm this by running with ONLYWEBSTER = false.

jeffbyrnes commented 2 years ago

The filtering is very fragile, so it’s probably every-so-slightly different for these lines vs the ones that are kept.

nickwynja commented 2 years ago

@jeffbyrnes @dpcpnry I've opened a PR that should address this if you'd like to test it out.

jeffbyrnes commented 2 years ago

I saw, thanks! I’m moving today, so will probably get a chance to review this weekend. On Jan 5, 2022, 10:32 AM -0500, Nick Wynja @.***>, wrote:

@jeffbyrnes @dpcpnry I've opened a PR that should address this if you'd like to test it out. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

jeffbyrnes commented 2 years ago

Fixed via 3375fe69