As discovered by @antndlcrx, we can get better performance by using sigmoid rather than identity activation functions for continuous inputs, since we scale these to 0-1 in the data preprocessing.
@tsrobinson to update codebase to make this default behavior.
As discovered by @antndlcrx, we can get better performance by using sigmoid rather than identity activation functions for continuous inputs, since we scale these to 0-1 in the data preprocessing.
@tsrobinson to update codebase to make this default behavior.