seattleflu / augur-build

the (in development) augur build for understanding influenza dynamics in Seattle
https://seattleflu.org
0 stars 2 forks source link

Finding Genetic Distance to Closest Sample #25

Closed miparedes closed 5 years ago

miparedes commented 5 years ago

This PR addresses #21

A new script finding_lowest_genomic_distance.py was created under the file /analyses in order to find the lowest genetic distance between each Seattle sample and the strain in the entire dataset that it is genetically closest to. A PNG file with a histogram of the results is then produced. It is currently set up as an independent script which has to be called via the command line(with the inputs --metadata and --alignment specified) but can be modified to be included in the Snakefile if needed. The analysis is currently comparing Seattle strains vs all the strains in the dataset but let me know if you'd prefer the script to only compare Seattle with Seattle.

trvrb commented 5 years ago

@miparedes ---

This is great! I just pushed a couple small changes for stylistic and linting reasons. Could you update this to take --output-figure and --output-table? --output-figure is the file to write PNG to and ---output-table is the file to write TSV of distances to. Thank you!

miparedes commented 5 years ago

@trvrb

Sorry for the double commit! I'm still trying to figure Github out...

The second commit should have the updated script for including --output-figure and --output-table . I made both of them optional in case you only want to call one or the other. I also updated the plotting function to output the histogram instead of the bar chart. Let me know if there was something wrong with the git commit and I'll fix it promptly!

miparedes commented 5 years ago

@trvrb

The newest commit has all the stylistic changes as requested

trvrb commented 5 years ago

Great! This is still running, but everything looks in place. One further thing to think about, with these sorts of long running scripts it's useful to have a sense of progress. You can make a simple progress bar the way I did here: https://github.com/seattleflu/augur-build/blob/master/scripts/connected_components.py#L69

Print a - for each 100th virus, or whatever. Or print a - for when you hit 1/10th of the samples and the next at 2/10ths of the samples, etc...

trvrb commented 5 years ago

Everything looks good. Thanks @miparedes!