How are the positional encodings derived

After reading the paper, it seems as though the content stream consists of its own word embedding and positional encoding, along with the word embeddings and positional encodings associated with its respective permutation vector, and the query stream consist of its positional encoding and a random W embedding, along with the word embeddings and the positional encodings of its respective permutation vector. My question is, what is the positional encoding? Is it a learnable vector as in the case of BERT, or the sinusoid function used in other transformers? I'd like to understand how this encoding is derived. Thanks!

zihangdai / xlnet

How are the positional encodings derived #279