tskit-dev / tsinfer

Infer a tree sequence from genetic variation data.
GNU General Public License v3.0
56 stars 13 forks source link

Overwriting md schema #929

Closed hyanwong closed 3 months ago

hyanwong commented 3 months ago

We've had problems in the past when taking tree seqs that are already inferred then trying to infer them again, because if the md schema name already exists, we bomb out, here:

https://github.com/tskit-dev/tsinfer/blob/1d45c0c8122d0680cee2fcfd4b23c7dbbcb7a497/tsinfer/inference.py#L118

I suggest that we add a description to the md schema field (e.g. for "sample_data_id", "sample_data_time" etc.) and if we detect that the description also matches in the existing metadata we then overwrite with a warning. The justification is that if the description matches (which presumably contains the word "tsinfer") then we are simply stomping on our own data.