nigelhorne / ged2site

Create a family tree website from a Gedcom file
https://genealogy.nigelhorne.com
GNU General Public License v2.0
36 stars 4 forks source link

Data::Text can kill ged2site? ("attempt to add consecutive punctuation") #112

Closed jhannah closed 3 weeks ago

jhannah commented 1 month ago

Huh. Weird.

✗ ./ged2site -cFdh 'Jay Weston Hannah' -l ~/Desktop/jay_new.ged
...
Data::Text: attempt to add consecutive punctuation           ]  36% [1529/4182]
    Current = 'The twin brother of Jane E. and the 2nd of 3 children of <a href="?page=people&entry=I2599">James Hamilton</a> and <a href="?page=people&entry=I2600">Myrtle Combs</a>, <b>Harry</b>is  the third cousin once-removed on the father's side of <a href="?page=people&home=1">Jay Hannah</a> and was born on Jul 25, 1933 along with his twin sister Jane E.' added at 3510 of ./ged2site
    Append = '.' at ./ged2site line 3510.
- program exits -

Looks like this is a "feature" of Data::Text?

$ perl -MData::Text -e 'my $t1 = Data::Text->new("Jane E.")->append(".")'
Data::Text: attempt to add consecutive punctuation
    Current = 'Jane E.' added at 70 of /Users/jhannah/perl5/perlbrew/perls/perl-5.38.0/lib/site_perl/5.38.0/Data/Text.pm
    Append = '.' at -e line 1.

Which kills ged2site given some input data? uhh... seems like a bad feature?

nigelhorne commented 1 month ago

That feature is deliberate and correct, it catches bugs in my code which creates consecutive punctuation. I'll look and see if I can track what's happening.

jhannah commented 1 month ago

👍 Jane works fine, but if the phrase happens to end with a period (e.g. Jane E.) it explodes.

If I hack it to this:

$bio_dt->append(conjunction(map { my $tmp = $_->as_string(); $tmp =~ s/\.+$//; return $tmp } @phrases))->append('.');

The program keeps going past that error. The output is:

$ ack 'The twin brother of Jane E. and the 2nd of 3 children' static-site
static-site/Harry-Hamilton-1933.html
<p>The twin brother of Jane E. and the 2nd of 3 children of <a href="James-Hamilton-1901.html">James Hamilton</a> and <a href="Myrtle-Combs-1898.html">Myrtle Combs</a>, <b>Harry</b> is the third cousin once-removed on the father's side of <a href="Jay-Hannah-1975.html">Jay Hannah</a>

Which I assume is correct. :)

jhannah commented 1 month ago

Apparently still broken at ae15cb157a6ba02341ab01033b2e6e8c2a9d3df8. It now doesn't reach Jane 36% [1529/4182], it dies on Jennifer 6% [ 291/4176]. 😄

$ ./ged2site -cFdh 'Jay Weston Hannah' -l ~/src/private/genealogy/jay.ged
...
Data::Text: attempt to add consecutive punctuation           ]   6% [ 291/4176]
    Current = 'The husband of <a href="?page=people&entry=I2836">Linda Lee (Perry) Franzman</a> (the third cousin on the father's side of <a href="?page=people&home=1">Jay Hannah</a>) and had 2 children, Jeffrey A. and Jennifer L.' added at 7519 of ./ged2site
    Append = '.' at ./ged2site line 7519.

Here are my ugly hacks to get it to not explode: https://github.com/nigelhorne/ged2site/compare/master...jhannah:ged2site:112-hack?expand=1

nigelhorne commented 1 month ago

Though your hack will certainly work, it's fixing the symptom rather than the problem. If possible (and sometimes it isn't) I'd rather find out why two full stops are being added than add both and then take one away. So I'm going to see if I can reproduce what you're seeing then I'll be in a place where I can fix it.

jhannah commented 1 month ago

FYI, at master 7ad9ccbc53d3b40f9d309fe580dbfb0b9d5fd569 the error has now moved to line 7533 (used to be 7519).

Data::Text: attempt to add consecutive punctuation           ]   6% [ 291/4176]
    Current = 'The husband of <a href="?page=people&entry=I2836">Linda Lee (Perry) Franzman</a> (the third cousin on the father's side of <a href="?page=people&home=1">Jay Hannah</a>) and had 2 children, Jeffrey A. and Jennifer L.' added at 7533 of ./ged2site
    Append = '.' at ./ged2site line 7533.
./ged2site -cFdh 'Jay Weston Hannah' -l ~/src/private/genealogy/jay.ged  100.86s user 19.81s system 94% cpu 2:08.17 total
nigelhorne commented 1 month ago

I've been able to reproduce this with a test gedcom that I have.

jhannah commented 1 month ago

FYI at master 85763dae7c20a63a8de4375988de27c3a16eb9ae

./ged2site -cFdh 'Jay Weston Hannah' -l ~/src/private/genealogy/jay.ged
...
[====   7589 of ./ged2site                                   ]   7% [ 316/4176]
    1085 of ./ged2site
BUG: string not set at ./ged2site line 13904, <GEN0> line 2582.
-- program exits --
jhannah commented 1 month ago

FYI at master f5bcc0658cd730a95c2213f43a802c63de454e90

Data::Text: attempt to add consecutive punctuation           ]  29% [1230/4176]
    Current = 'The child of <a href="?page=people&entry=I1140">Rolland Franzman</a> and <a href="?page=people&entry=I2782">Martha Smallberger</a>, <b>Richard</b>was  the second cousin once-removed on the father's side of <a href="?page=people&home=1">Jay Hannah</a> Hewas married twice (to <a href="?page=people&entry=I2830">Isabelle Ware</a> (possibly not married to her) and <a href="?page=people&entry=I2835">Delores Linder</a> (possibly not married to her)). He had 5 children: Linda Lee (Perry), Cheryl Lee (Perry), Richard Harlo , Jr. and Thomas Alan with Delores Irene; and Martha Ann with Isabelle C.' added at 7572 of ./ged2site
    Append = '.' at ./ged2site line 7572.
./ged2site -cFdh 'Jay Weston Hannah' -l ~/src/private/genealogy/jay.ged  473.44s user 81.62s system 94% cpu 9:49.05 total
jhannah commented 1 month ago

Recapping my branches in my fork:

nigelhorne commented 1 month ago

Recapping my branches in my fork:

  • 112-hacks-round2 seems to run to completion on my big GEDCOM. It's a 1-liner hack.
  • 115-bday-of-living-is-private does my custom birthday behavior.
  • 112-and-115 is a merge of the above two.

Could you generate a context diff, please? I'll take a look.

jhannah commented 1 month ago

112-hacks-round2 is just this one line change:

diff --git a/ged2site b/ged2site
index 91130b66..afa8e550 100755
--- a/ged2site
+++ b/ged2site
@@ -7569,7 +7569,7 @@ sub print_person
                push @phrases, $phrase;
        }
        if(scalar(@phrases)) {
-               $bio_dt->append(conjunction(map { $_->as_string() } @phrases))->append('.');
+               $bio_dt->append(conjunction(map { my $tmp = $_->as_string(); $tmp =~ s/\.+$//; $tmp } @phrases))->append('.');
                $phrase = undef;
                @phrases = ();
        }
jhannah commented 1 month ago

FYI hanging off of master e01ee88d2af2b1a36041ab79092436530919a536 I'm getting this:

./ged2site -cFdlh 'Jay Weston Hannah' ~/src/private/genealogy/jay.ged
-- runs for 23m, then dies: --
Data::Text: attempt to add consecutive punctuation==         ]  86% [3596/4176]
    Current = 'The 11th of 12 children of <a href="?page=people&entry=I125">William Stark</a> and <a href="?page=people&entry=I126">Rebecca Ragsdale</a>, <b>Matilda</b>was  the four times great-aunt of <a href="?page=people&home=1">Jay Hannah</a> and was born in Parke Co., Indiana on Nov 9, 1846.<p>She died on Feb 18, 1849 in Parke Co.' added at 7573 of ./ged2site
    Append = '.' at ./ged2site line 7573.

With my hacks in place, here's my new site (work in progress) ❤️ http://jays.net/genealogy/static-site/I1265.html

jhannah commented 1 month ago

FYI on master df7ab4a5f9e297d2e03d173d6ea833e529f68172 I'm now hitting this:

Data::Text: attempt to add consecutive punctuation           ]  10% [ 444/4176]
    Current = 'The 4th of 9 children of <a href="?page=people&entry=I1011">Robert Bunker</a> and <a href="?page=people&entry=I1012">Mira Dillingham</a>, <b>Jonathan</b>was  the three times great-uncle of <a href="?page=people&home=1">Jay Hannah</a>, was born in Randolph Co., Indiana on Dec 3, 1838 and was married twice (to <a href="?page=people&entry=I1001">Julia Collins</a> (on Mar 24, 1867 in Henry Co., Iowa, USA) and <a href="?page=people&entry=I1703">Mary Seaton</a> (on Feb 29, 1880 in Jackson Twp., Henry Co., IA, following the death of Julia Ann on Oct 4, 1875)). He had 8 children: infant son, Mattie and Robert Henry with Mary Jane; and Katie Ann, George William, James Warren, Laura B and Julia with Julia Ann.<p>He died on May 15, 1889 in Jackson Twp.' added at 7584 of ./ged2site
    Append = '.' at ./ged2site line 7584.
./ged2site -cFdlh 'Jay Weston Hannah' ~/src/private/genealogy/jay.ged  130.93s user 30.80s system 90% cpu 2:59.25 total
nigelhorne commented 3 weeks ago

Is this still happening with your gedcom file?

jhannah commented 3 weeks ago

On master 3f471cefa469f51651157ecd06cedd1936de71a6 looks like the same error, it just moved to line 7571 now.

✗ time ./ged2site -cFdlh 'Jay Weston Hannah' ~/src/private/genealogy/jay.ged
Data::Text: attempt to add consecutive punctuation           ]  10% [ 444/4180]
    Current = 'The 4th of 9 children of <a href="?page=people&entry=I1011">Robert Bunker</a> and <a href="?page=people&entry=I1012">Mira Dillingham</a>, <b>Jonathan</b>was  the three times great-uncle of <a href="?page=people&home=1">Jay Hannah</a>, was born in Randolph Co., Indiana on Dec 3, 1838 and was married twice (to <a href="?page=people&entry=I1001">Julia Collins</a> (on Mar 24, 1867 in Henry Co., Iowa, USA) and <a href="?page=people&entry=I1703">Mary Seaton</a> (on Feb 29, 1880 in Jackson Twp., Henry Co., IA, following the death of Julia Ann on Oct 4, 1875)). He had 8 children: Katie Ann, George William, James Warren, Laura B and Julia with Julia Ann; and infant son, Mattie and Robert Henry with Mary Jane.<p>He died on May 15, 1889 in Jackson Twp.' added at 7571 of ./ged2site
    Append = '.' at ./ged2site line 7571.
./ged2site -cFdlh 'Jay Weston Hannah' ~/src/private/genealogy/jay.ged  142.21s user 29.65s system 90% cpu 3:10.76 total

(I've been using my branch 112-hacks-round2 to get around this one.)

nigelhorne commented 3 weeks ago

Do you have a Gedcom snippet that I can use to reproduce that?

jhannah commented 3 weeks ago

Looks like I can recreate it with that dude and his 2 wives:

0 HEAD
1 SOUR GEDitCOM II
0 @I1703@ INDI
1 NAME Mary Jane /Seaton/
1 SEX F
1 BIRT
2 DATE 28 DEC 1852
2 PLAC nr. Cincinnati, Ohio
1 DEAT
2 DATE 31 AUG 1924
2 PLAC Arkansas City, Cowley Co., Kansas
1 NOTE @X269@
1 FAMS @F682@
0 @I1001@ INDI
1 NAME Julia Ann /Collins/
1 SEX F
1 BIRT
2 DATE 18 SEP 1847
2 PLAC Ohio Co. Indiana
1 DEAT
2 DATE 4 OCT 1875
2 PLAC Jackson Twp., Henry Co., IA
1 FAMS @F378@
0 @I1600@ INDI
1 NAME Jonathan Smith /Bunker/
1 SEX M
1 BIRT
2 DATE 3 DEC 1838
2 PLAC Randolph Co., Indiana
1 DEAT
2 DATE 15 MAY 1889
2 PLAC Jackson Twp., Henry Co., IA
1 FAMS @F378@
1 FAMS @F682@
0 @F682@ FAM
1 HUSB @I1600@
1 WIFE @I1703@
1 MARR
2 DATE 29 FEB 1880
2 PLAC Jackson Twp., Henry Co., IA
0 @F378@ FAM
1 HUSB @I1600@
1 WIFE @I1001@
1 MARR
2 DATE 24 MAR 1867
2 PLAC Henry Co., Iowa, USA

./ged2site -cFdlh 'Jonathan Smith Bunker' ~/src/private/genealogy/jay_small.ged
Data::Text: attempt to add consecutive punctuation===========] 100% [3/3]
    Current = 'was born in Randolph Co., Indiana on Dec 3, 1838. Hewas married twice (to <a href="?page=people&entry=I1001">Julia Collins</a> (on Mar 24, 1867 in Henry Co., Iowa, USA) and <a href="?page=people&entry=I1703">Mary Seaton</a> (on Feb 29, 1880 in Jackson Twp., Henry Co., IA, following the death of Julia Ann on Oct 4, 1875)).<p>He died on May 15, 1889 in Jackson Twp.' added at 7571 of ./ged2site
    Append = '.' at ./ged2site line 7571.
nigelhorne commented 3 weeks ago

Confirmed. That helps, thanks.

$ GMAP_KEY= perl -MDevel::Hide=Geo::libpostal ./ged2site -cFd ~/gedcoms/issue_121.1.ged 
Devel::Hide hides Geo/libpostal.pm
Data::Text: attempt to add consecutive punctuation===========] 100% [3/3]
    Current = '<b>Jonathan Bunker</b> was born in Randolph Co., Indiana on Dec 3, 1838. Hewas married twice (to <a href="?page=people&entry=I1001">Julia Collins</a> (on Mar 24, 1867 in Henry Co., Iowa, USA) and <a href="?page=people&entry=I1703">Mary Seaton</a> (on Feb 29, 1880 in Jackson Twp., Henry Co., IA, following the death of Julia Ann on Oct 4, 1875)).<p>He died on May 15, 1889 in Jackson Twp.' added at 7578 of ./ged2site
    Append = '.' at ./ged2site line 7578.
jhannah commented 3 weeks ago

You could throw all your publically sharable ~/gedcoms/ (like that one) into t/gedcoms/ and we could have a t/no_die.t that asserts that ged2site can process all of those without dying? The more tests the better. :)

jhannah commented 3 weeks ago

master 839a2e457d5ae3d8ebbad92b6469427ed52a8930 Woot! 🎉

./ged2site -cFdlh 'Jay Weston Hannah' ~/src/private/genealogy/jay.ged  1507.94s user 299.94s system 92% cpu 32:26.70 total