joeyaurel / python-gedcom

Python module for parsing, analyzing, and manipulating GEDCOM files
https://gedcom.joeyaurel.dev
GNU General Public License v2.0
152 stars 38 forks source link

Words for structure of Gedcom file #23

Open andersbel opened 5 years ago

andersbel commented 5 years ago

It seems there was an update in names of classes and methods in the 1.0.0 version, and it is good to use Elements about parts of a Gedcom files. This differentiates between Records, which are about individuals or marriages, and Elements that represent parts of a Gedcom file. But why keep the names Child and Parent for Elements that are connected to a particular Element? For a library like this meant to cope with genealogy, these two words represent quite particular things and not necessarily the structure of a gedcom. One option would be to use Sub and Super. E.g. Element, SubElement and SuperElement. Would like to hear your thoughts on this. Great software by the way!

joeyaurel commented 5 years ago

That's a good point! And you're right. It is a bit confusing using the terms "child" and "parent" for both child-/parent-elements and children/parents within the cope of genealogy. "sub" and "super" sound much better for GEDCOM-elements. I will implement that in version 1.1.0 :) Thank you!

prism44 commented 5 years ago

I found the use of Elements and elements very confusing. I understand that a gedcom line is an element for purpose of this software. We then have the class Element to describe and work with each line. Then all information for a particular individual is called an Element "object" and the same for a particular marriage.

I think going up a level and abstracting the terminology would go a long way. Introduction of XML or tree-like terminology is confusing for what is loosely a flat data source trying to represent a "tree". Originally, each ged line was a record to be transmitted. Calling the "individual and it's associated information" a record is confusing as well. Trying to couple the original concept with modern OOD is where I think some work needs to be done.

I did notice the effort to that end by defining the derivative classes "individual" and "family".

andersbel commented 5 years ago

I think it is good to make a difference between the structure of a gedcom file and the structure of relationships between people. A gedcom can represent records of inidividuals, marriages and parent/child relations. Or at least records in the form of information pointers to physical records such as church records. But a gedcom file is not built up by records, it is built up by lines that carry information. Element is one suitable word for such lines and groups of lines. Some groups of lines, represent records for individuals. Other groups represent families.

What is called family trees are not trees in the sense of a mathematical graph. Biological family relationships can be represented by a mathematical graph that is directed and cyclic. Directed because parents get children. Cyclic because relatives do get children with each other. Most often not close relatives, but cousin marriages are common in some parts of the world.

The structure of a gedcom however, is more like a mathematical tree. But it is directed and there is a root element so it should be a prime example of a rooted tree. This is a common data structure in computer science.

prism44 commented 5 years ago

My point exactly! "But a gedcom file is not built up by records, it is built up by lines that carry information."

However, I disagree with "Element is one suitable word for such lines and groups of lines."

If you want to call a gedcom line an Element, great. We part ways in calling a group of gedcom lines an Element. They are not the same thing. There is a reason we have the word "sentence" and the word "paragraph".

One is a collection of the other. So it is in the gedcom file. A "line" is the fundamental building block of the gedcom file. A "record" is a collection of "lines" and describes an individual or a family or an object, etc.

To use the term "element" to describe a "line" is fine. To use the same term to describe a "record" is not.

That's why we have "vector" and "list" in Python. They are similar but their underlying data structures are different and so we call them different names.

andersbel commented 5 years ago

In my mind Nick made a good design decision when introducing wording for parts of a gedcom file. And I maintain that Element is one suitable word for such parts. Feel free to present an alternative word.

There are vectors and subvectors which are parts of a vector. Lists and sublists. For example, in a database there are bytes and bits, and some particular sets of bytes represent a record. So why not elements and subelements of a gedcom? And the smallest subelement is a line. Then some particular groups of elements represent a record, either of an individual or a family.