yatisht / usher

Ultrafast Sample Placement on Existing Trees
MIT License
120 stars 40 forks source link

public tree: Apostrophes in clade names #339

Closed theosanderson closed 1 year ago

theosanderson commented 1 year ago

(This is probably mostly a public-tree issue, but I guess there's a possibility it could also impact UShER in terms of Newick export.)

I noticed that the Cov2Tree Taxonium Tools pipeline got upset yesterday. The error relates to TreeSwift complaining that the Newick file it gets (from the protobuf) is not what it considers a valid Newick file.

Looking into it, my current feeling is that these

CoteD'Ivoire/MTMC_14_044/2021|OQ788229.1|2021-10-11):1,
CoteD'Ivoire/MTMC_14_133/2021|OQ788230.1|2021-10-15):1,
CoteD'Ivoire/MTMC_06_272/2022|OQ788228.1|2022-03-14,

are the issues, because TreeSwift expects apostrophes to be used as quotes to wrap literal names. It looks like they are recent additions.

Would it be possible to avoid those? No prob if not, I can add some code to handle this.

AngieHinrichs commented 1 year ago

Sorry about that -- Yes, I've been stripping those quotes from GISAID sequences for a long time but they're pretty new (2023-04-16) in INSDC sequences. Will fix...

theosanderson commented 1 year ago

Thanks @AngieHinrichs!

AngieHinrichs commented 1 year ago

Sorry, I didn't fix this in time for today's (2023-04-17) build, but it should be fixed in tomorrow's build. I will check tomorrow to make sure.

theosanderson commented 1 year ago

Thanks again!