nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 128 forks source link

`augur curate passthru` can add double quotations #1312

Closed joverlee521 closed 1 week ago

joverlee521 commented 10 months ago

Current Behavior

When using the --metadata input, field values with double quotes in them can result in additional double quotes in the output.

Since the metadata is read through csv.DictReader, we can probably tweak this behavior through the csv.Dialect attributes

https://github.com/nextstrain/augur/blob/961cb0042c6744cff3925dd97251187e4532a082/augur/io/metadata.py#L183

Additional context

This was first observed in https://github.com/nextstrain/monkeypox/pull/179

tsibley commented 9 months ago

The escaped double quotes ("" for internal quotes) are more correct, no?

joverlee521 commented 9 months ago

The escaped double quotes ("" for internal quotes) are more correct, no?

Hmm, yes. I guess the issue comes up when running through augur curate multiple times. The double quotes keep getting added on with each pass. This is an example of a string with internal quotes that goes through curate 3 times:

SRC VB "Vector", Molecular Biology of Genomes
"SRC VB ""Vector"", Molecular Biology of Genomes"
"SRC VB ""Vector"""", Molecular Biology of Genomes"""
"SRC VB ""Vector"""""""", Molecular Biology of Genomes"""""""
tsibley commented 9 months ago

OH! Yeah, that's a misconfiguration of the parser/producer then.