[2/x] clean up casting functions: delayed scaling

Stack from ghstack (oldest at bottom):

-> #343
342
341

Summary:

Removes delayed scaling from float8_tensor.py. After this PR, the invariant is that everything in float8_tensor.py requires the scale to be calculated elsewhere. This moves the codebase towards separation of concerns for calculating the scale (via various scaling strategies), separated from creating an instance of Float8Tensor.

Note that stateful delayed scaling is the reason we need this separation.

Test Plan:

./test/test_everything.sh

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-labs / float8_experimental

[2/x] clean up casting functions: delayed scaling #343

342

341