nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 128 forks source link

relax schema enforcement of `generated_by` in node-data files #1476

Open jameshadfield opened 1 month ago

jameshadfield commented 1 month ago

The generated_by field of node-data JSONs is useful to encode metadata (meta-metadata?), however we currently strictly enforce a structure here when we read files as we use this field to check for compatible augur versions. Concretely, the following is invalid:

{
  "generated_by": {
    "person": "james"
  },
...
  File "/Users/naboo/github/nextstrain/augur/augur/util_support/node_data_file.py", line 23, in __init__
    self.validate()
  File "/Users/naboo/github/nextstrain/augur/augur/util_support/node_data_file.py", line 97, in validate
    if self.is_generated_by_incompatible_augur:
  File "/Users/naboo/github/nextstrain/augur/augur/util_support/node_data_file.py", line 49, in is_generated_by_incompatible_augur
    compatible_version = is_augur_version_compatible(
  File "/Users/naboo/github/nextstrain/augur/augur/__version__.py", line 22, in is_augur_version_compatible
    this_version = packaging.version.parse(version)
...
TypeError: expected string or bytes-like object

We should relax this checking and (e.g.) only check for version (incompatibility) if generated_by.program = "augur". The code currently looks like this:

https://github.com/nextstrain/augur/blob/1694a3fe1f2e4271849120ee08b29cb8286dc1fa/augur/util_support/node_data_file.py#L44-L53

so all we need is to conditionally calculate compatible_version depending on the generated_by_augur boolean.