reading README.md file in setup.py with utf-8 encoding as some systems may use different default text encoding
added checking for device type to ensure only cuda supported devices will use torch.cuda.Stream() and pin_memory(). this will resolve errors that occur during inferencing on cpu only devices.
added post install command to upgrade transformers to latest version to ensure "ValueError: rope_scaling must be a dictionary with two fields" error does not occur.
added the support for non sharded models on huggingface repo which are just single model files like 'model.safetensors'.
addressing issue: #169
changelog:
rope_scaling
must be a dictionary with two fields" error does not occur.