Check example sequences in the corona virus dataset, and determine the k value for the de Bruijn graph, so that, 1). No duplicate k-mer exists; 2). 1% of total nodes duplicate; 3). 5% of total k-mers duplicate.
[x] Write a program to generate a set of k-mers for the input sequence and check the number of duplicate node.
[x] Determine the k values for different scenarios mentioned above.
Check example sequences in the corona virus dataset, and determine the k value for the de Bruijn graph, so that, 1). No duplicate k-mer exists; 2). 1% of total nodes duplicate; 3). 5% of total k-mers duplicate.