nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 128 forks source link

Cannot add new attributes each node #132

Closed DasuberVetLeonidas closed 4 years ago

DasuberVetLeonidas commented 6 years ago

Hi thanks for the software. It's been fun to use on new virus species. However, for our study we would like to be able to add host as an attribute to each virus node. So we tried to just modify the *.prepare.py file and added 'host' into 'header_fields'. It is correctly picked up by the software and is written into the prepared json file. However, during processing, the process.py doesn't seem to be able to pick up the host attribute and would instead fill it in with the default root value.

So we would like to know if there is a proper way to add new attributes to each node so that we can color the virus stains not only by the default classes, but also like host, town, sex etc.

jameshadfield commented 6 years ago

That sounds about right. I would suggest following the WNV template, which produces https://nextstrain.org/WNV/NA?c=host. Specifically, for prepare host is defined both in the header_fields (https://github.com/nextstrain/augur/blob/master/builds/WNV/wnv.prepare.py#L29) and the colors (https://github.com/nextstrain/augur/blob/master/builds/WNV/wnv.prepare.py#L40). Then in process, color options are set again ( (https://github.com/nextstrain/augur/blob/master/builds/WNV/wnv.process.py#L30). (For what it's worth, we're currently updating how augur works to make this more intuitive.)

tsibley commented 6 years ago

@jameshadfield For what it's worth, I'm guessing this is the code @DasuberVetLeonidas is working on: https://github.com/DasuberVetLeonidas/hendravirus. It does look like that they've included host in those three places, so I wonder if that's not enough or if they got it working after all…

DasuberVetLeonidas commented 6 years ago

Thank you very much for your response, and for making this amazing software suite as well. Yes the hendravirus build we are working on is in https://github.com/DasuberVetLeonidas/hendravirus

And yes I have added 'host' in to "header_fields", as well as "colors" and the "colors.tsv" file. And if you check the hendra.json in the prepared folder, it appears that the host information was correctly read. However, it seems that while running hendra.process.py the host information is dropped.

Hope this helps you in understanding the reason for this unexpected behaviour. I am currently working with Australian Animal Health Laboratory as part of my clinical placement for my Veterinary degree, and there are many more viral genomes that would need modelling in the future.

huddlej commented 4 years ago

@DasuberVetLeonidas Since augur's command line interface has changed substantially in the last two years, I'm going to close this issue as outdated. However, if you are still working with Nextstrain for your pathogen builds, please check out the latest tutorial for creating a virus build.

If you have upgraded your build pipelines to use the latest augur code and you still encounter the same issue described here, please open a new issue in this augur repository and we'll help you figure it out. Thank you!