[ ] Method to map a sequence of nucleotides to a sequence of ids
[ ] Method to map one nucleotide to one id, i.e. the mapping A,C,G,T -> 0,1,2,3
[ ] Method to collate offset sequences to for k-mer embeddings, e.g. ACGGGTCA -> {{0,1,2,2,2,3,1,0},{1,2,2,2,3,1,0},{2,2,2,3,1,0}} -> {{0,1,2},{1,2,2},{2,2,2},{2,2,3},{2,3,1},{3,1,0}} for k = 3
Create a tokenizer for base pairs of gene sequences.