nextstrain / ncov

Nextstrain build for novel coronavirus SARS-CoV-2
https://nextstrain.org/ncov
MIT License
1.35k stars 403 forks source link

Move subsampling by proximity into augur #816

Open sidneymbell opened 2 years ago

sidneymbell commented 2 years ago

Context The ability to subsample using a focal set and then background context selected by genetic relatedness is SUPER useful. We'd like to be able to do this for other pathogens.

Description Move the subsampling logic that assigns proximity scores by genetic relatedness out of ncov and into augur proper.

Examples

Possible solution (Optional)

huddlej commented 2 years ago

@jameshadfield has implemented our subsampling logic as a script in ncov, but we haven't revisited this with more testing. We originally planned to implement this as augur subsample (and @jameshadfield created a corresponding Augur PR), but it seemed likely at the time that we'd substantially change this subsampling logic and we didn't want to codify the current implementation in augur only to change it again.

Would you all be up for trying out James's script version of subsampling for SARS-CoV-2 or other pathogens? That could help us figure out if it's worth the effort to move into Augur...

sidneymbell commented 2 years ago

Oh awesome, thanks @huddlej and @jameshadfield! We'd be up for trying out the script and potentially helping to move things over to augur if helpful, although we probably won't be working on this until a bit later / early spring.