According to Phil's comment, we should ideally use parameter rewards although the code still runs fine with parameter reward.
Ah, good point. Yes, this is a typo.
Fortunately, it's a typo that doesn't affect the outcome.
The reward (the numpy array) isn't connected to the graph, so apparently PyTorch is just as happy dealing with a numpy array as it is a tensor.
According to Phil's comment, we should ideally use parameter rewards although the code still runs fine with parameter reward.