Closed corneliusroemer closed 1 week ago
Hi @corneliusroemer,
Thanks for your report.
It seems plausible that this addition causes the backwards incompatible breakage for dataformats when that field is present in the content received from the server.
That is correct.
We recommend updating both dataformat
and datasets
together to avoid this error.
Best, Eric
I think there's a misunderstanding - this error is not due mismatch between datasets and dataformat. Both are at the same version. The error appears when not updating to 16.29.0 at all.
It seems server side changes applied at the same time as the release of 16.29.0 broke dataformat for everyone who is not using the latest version.
It caused build failure in all Nextstrain ingest jobs, Pathoplexus etc.
It's a real breaking change for everyone who doesn't update to the latest version the second it's out.
Maybe I didn't make this clear enough in my issue report.
The way this should work, I think, is that the client (datasets) tells the server which version it has. And if the version is <16.29.0 then the client does not get the new field, so that it doesn't error on the unexpected usaState field.
If you expect to add more fields in the future that would cause similar breakage for past versions it might make sense to adjust the dataformat command to not error when it gets unexpected fields.
A manual prevention that users can apply to prevent this type of error in the future is to not use dataformat as it appears prone to this breakage - IIUC dataformat only turns a jsonl into a TSV in our case so it shouldn't be hard to replace it with a manual script that doesn't suffer from the type of bug encountered here.
Hi @corneliusroemer,
Thanks for the clarification. You are correct, the new field is a breaking change when downloading a package using datasets and then using dataformat to generate a table.
While we consider how we can avoid such breaking change when we add new fields in the future, here are a couple of options for working around this problem:
--force
flag with dataformat to ignore the error. You could build this into your pipeline to ignore the error in the future. Please note that with the addition of new fields the order of columns in the table could be affected.datasets summary virus genome taxon 3048448 --as-json-lines | dataformat tsv virus-genome
. This avoids triggering the error because datasets summary
does not pick up the new field.Thanks again for your feedback.
Best, Eric
Describe the bug
Version 16.26.2 was working well until around 9pm UTC on August 17 (around 12 hours ago at time of writing).
Since then, when running dataformat on dataset download results in an error:
dataformat doesn't recognize this input ... Error: unknown field "usaState"
To Reproduce
Steps to reproduce the behavior:
Logs:
Workaround
Upgrade to the latest version (16.29.0) seems to fix this. A look at its release notes shows an entry:
It seems plausible that this addition causes the backwards incompatible breakage for dataformats when that field is present in the content received from the server.