poseidon-framework / poseidon-schema

An archaeogenetic genotype data organisation file format
0 stars 1 forks source link

[Review recommendation] Introduce an explicit .janno column for sex chromosome aneuploidies #81

Open nevrome opened 2 months ago

nevrome commented 2 months ago

This recommendation was raised in the review of the Poseidon paper.

nevrome commented 2 months ago

From the wiki article I gather that there are Autosomal and Non-autosomal (gonosomal) aneuploidies. It is possible that humans have multiple aneuploidies that involve both autosomal and gonosomal aneuploidies. Naturally one individual can only have one aneuploidy involving the sex chromosomes, but cases with double or multiple autosomal aneuploidies have been observed. Mosaic forms, where the aneuploidy only affects some, not all, cells of the individual allow also higher life expectancy with more lethal aneuploidies.

I think we could add four columns Gonosomal_Aneuploidy + Gonosomal_Aneuploidy_Notes and Autosomal_Aneuploidy + Autosomal_Aneuploidy_Notes. While the latter two would just be the usual *_Notes free text fields, the former two could be choice fields with the most common aneuploidies hardcoded and an Other option to catch any others. Autosomal_Aneuploidy should probably be a list column, potentially even called Autosomal_Aneuploidies.

For Gonosomal_Aneuploidy I think the following choices may be sufficient: X, XX, XXX, XXXX, XXXXX, XY, XXY, XXXY, XXXXY, XYY, XXYY, XYYY, XYYYY

And for Autosomal_Aneuploidy: Trisomy8, Trisomy9, Trisomy13, Trisomy18, Trisomy21, Trisomy22

stschiff commented 2 months ago

Hmm, yes, thanks for this well-structured overview. To my feeling this is slightly too much. I would go for only the XX, XY, XXY and friends, and call it "Sex_Karyotype"? I like adding a note-field. So I think two fields are enough here. The autosomal trisomies are so incredibly rare that I would suggest that could go into the general Notes field?

nevrome commented 2 months ago

Well...

Throughout the world, the overall prevalence of DS [Down syndrome] is 10 per 10,000 live births, although in recent years this figure has been increasing. - Weijerman & de Winter 2010

Point is: It costs as nearly nothing to specify them now and when the rare cases arise then we're ready. Some of the more rare Sex_Karyotypes only happen in 1 of 100000 live births. But given the growth of ancient genomic data we should get a case for most of them sooner or later.

stschiff commented 1 month ago

How about a single new column Aneuploidy which can take various choices, like "Trisomy21", "XXY" and the like, perhaps even allowing multiple? We would predefine that list and, as usual, also define "Other" (to cater for non-humans) and then also add Aneuploidy_Note?