R2SC3 - Githubissues

Although you presented some data on sampling bias in your Supplementary Text (but this might be not satisfactory enough given the problem in the dataset), there is a real problem of sampling bias and this need to be addressed in your main text as well. How does the model deal with missing links? Is the prediction robust enough to provide a clear indication of virus spread in such a large geographical range? This could be a serious limitation (and problem) of your study. For serotype A, you analysed 131 VP1 sequences of which 44% are from Argentina (of which ~70% are from 2000 and 2001) and 21% from Venezuela (of which ~86% are recent samples - after 2001). In addition, the majority of your oldest samples are only from Brazil and Argentina. For serotype O, you have 167 sequences in total of which ~54% are from Ecuador (all after the 2002 and have been previously analysed - along with 30 sequences that have been included in this manuscript as well). Therefore, you have 90+30=120 sequences already analysed in a previous paper. Among the other, 36 sequences from Colombia (~22% of the total) are barely covering the 2000s (as you claim 1994 to 2008), since you have 5 sequences from 2000, 1 from 2002 and 2 from 2008, a gap of 6 year. For the type A database, your oldest samples are only from Colombia. You attempted a random sub-sampling that, as far as I understood, have not taken into account the time of sampling, but just the quantity of data from each country. Maybe you need to account for time in your sub-sampling.

maxbiostat / FMDV_AMERICA

R2SC3 #83