Embedding manipulation should be before filtering

Inside tfsenc_read_datum.py, we are doing embedding manipulations, such as shifting embeddings or concatenating them, but we are doing this at the end of read_datum. This might be a problem if we are filtering out non_words and align with other models, which for instance reduces 90k gpt2-xl tokens to 60k (aligning with glove) for 625. This might harm encoding performance if we are shifting embeddings afterwards.

Consider moving all embedding manipulation stuff in front?

hassonlab / 247-encoding

Embedding manipulation should be before filtering #76