LazyRegressionDataset, compute_norms.py, and fix to loaders

This PR has several major improvements:

Adds a LazyRegressionDataset and model_type=regression_lazy, which is like the existing regression / MTR pretraining mode, except that it takes a .smi file and computes RDKit descriptors on-the-fly as part of the dataset's preprocess() method.
Adds a script compute_norms.py that takes in a .smi file and returns a .json with the mean and std of the descriptors for use in MTR pretraining.
Fixes a nasty bug affecting RawTextDataset where the model inputs included additional non-SMILES characters. See below.

seyonechithrananda / bert-loves-chemistry