jmcbroome / autolin

Repository containing a method for automatically identifying pathogen lineages from a phylogeny.
14 stars 1 forks source link

Adding pango naming function and access to rest of code #2

Closed jonnylydaa closed 1 year ago

jonnylydaa commented 1 year ago

cleaned up code. pango_lineage_naming function seems to be working correctly with incrementing the name of a lineage if it violates pango lineage naming guidelines. generate_yaml_report is not working currently but I laid out the current concept for the implementation and where it is right now. generate_yaml_report is commented out in the main function since it's not working but pango_lineage_naming is working in main.

jmcbroome commented 1 year ago

@jonnylydaa hey jonny, thanks for opening this. A few code review points:

1: You have merge conflicts you need to resolve before I can merge this! 2: The heavy use of comments is appreciated, but its just unnecessary at some points (dont need to tell me that you're about to append to a list before you call .append()!)

Also, I would like to see that yaml report function- what's the specific issue you're having with it? I notice you're using a flat dictionary with a list and trying to track columns by index- I would highly recommend using a nested dictionary structure instead, especially since users are very likely to pass metadata with scrambled column orders at some point when some columns are included or excluded. Also, for dates, I recommend using the builtin datetime library (https://docs.python.org/3/library/datetime.html). It's a bit obtuse but much, much better than trying to write your own code for handling date formatting. It's imported into generate_lineage_report.py and you may want to copy some of the boilerplate I used there.