Since now AdaptiveLogsoftmaxWithLoss is released, is it worth to just use their implementation instead of using the SplitCrossEntropyLoss? It seems we can also use the split method by using cutoffs parameters. Or is there any fundamental difference between those two?
Since now AdaptiveLogsoftmaxWithLoss is released, is it worth to just use their implementation instead of using the SplitCrossEntropyLoss? It seems we can also use the split method by using
cutoffs
parameters. Or is there any fundamental difference between those two?