Adding a number of extended tesnorrt execution provider setting
Adding the ability to warmup a model
Adding some extended ops
Despite these changes we're only liking at about a 2x speedup with a FP16 TensorRT engine due to a number of incompatible nodes - working on spitting up BERT as a pre-initialised text encoder shortly