Closed damonbrodie closed 6 years ago
I've forked your repository and I've added new logic that can optionally (but disabled by default) handle the issues produced by MyHeritage and Ancestry. I've also started documenting the methods in the Readme. Once I've finished that, I'll submit a PR. If that is accepted then I'd like to push the develop branch to master and publish the updated module.
PR created. I've got more README updates to make, but I thought I would kick off the PR now so that it can be reviewed.
Thank you @nomadyow! :) Reviewed the PR and merged it into master
. Test files will be added in a later state to be sure that the parser works as expected.
It seems that this is one of the only actively developed gedcom parsers in python these days (2018). Ancestry seems to produce gedcom files that break the parsing:
the lines in question are:
Notice the carriage return in the TEXT data that puts the next line "Born in Western Head..." onto a line by itself.
I believe that this breaks the gedcom format (though I have not researched this extensively in the spec). That being said, Ancestry is one of the largest genealogy providers and I think it would be ideal to have a parser that can parse the output from this provider.
I'm wondering if there is any interest handling this use case here? If so I can try and work up a patch and submit a PR.
I think there is a need to have a gedcom parser that can read "real world" gedcom files.