Closed corneliusroemer closed 2 years ago
Interestingly, when reading in a metadata file, we seem to be ok with name
or strain
but then in export we suddenly don't accept name
anymore. That's strange.
Should we remove support for name
or make export
accept name
to be in line with metadata_file.py
, see:
https://github.com/nextstrain/augur/blob/4b71e7d2f35c680c08488f691672bb60e24f5258/augur/util_support/metadata_file.py#L6-L12
We do support searching for multiple arbitrary strain ids when reading in metadata with the read_metadata
function in the io
module. This function returns a data frame indexed by the first requested id column that exists in the input. As a result, the calling code can consume the data frame without needing to know what the name of the id column is.
An alternate solution to #906 is to use io.read_metadata
in the export module instead of the current call to utils.read_metadata
. We could cast the data frame to a dict to avoid changing other code in the module or we could update the logic in parse_node_data_and_metadata
to use the data frame. We should really deprecate the utils.read_metadata
function, anyway, since io.read_metadata
was written to replace it eventually.
A lot of users seem to get the following type of error:
https://discussion.nextstrain.org/t/error-in-job-3-exporting-data-files-for-for-auspice/493/4
It's a common discussion topic on our forum and also in emails we get to hello@nextstrain.org
I think it would help users a lot if we raised a more informative error so that users know directly how to fix it.
Also, we don't seem to have documented the requirement that the metadata needs to contain a column called
strain
with strainnames.Both should be addressed.