Representation return types

cifkao commented 3 years ago

I think representations should not use obscure types like uint8 and uint16. This seems to be uncommon in Python and could cause e.g. overflow bugs for unsuspecting users (let's say if they want to offset the IDs to add custom tokens). Also, PyTorch doesn't support uint16.

I think the default NumPy int type, np.int_ (C long), would be a good default. Or at least a more common (signed) type. Or maybe plain Python lists should be used instead of NumPy arrays.

We can offer a dtype parameter (either for each representation, or for functions like to_pytorch_dataset, or both) in case the user wishes to save space.

salu133445 commented 3 years ago

Having a dtype argument sounds like a charm! And yes, we should use the default dtype by default to avoid confusions.

salu133445 commented 3 years ago

This is now supported by 94e9469.

salu133445 / muspy

Representation return types #45