According to the model description in the paper on arxiv (equations (1), (2) and (3)), each stage of PRGC consists of only one linear layer followed by a sigmoid. When I inspected the code, it seemed the each stage has an additional non-linear hidden layer (the MultiNonLinearClassifier class). This greatly increases the model size. Were the published results achieved using the smaller or bigger model?
According to the model description in the paper on arxiv (equations (1), (2) and (3)), each stage of PRGC consists of only one linear layer followed by a sigmoid. When I inspected the code, it seemed the each stage has an additional non-linear hidden layer (the MultiNonLinearClassifier class). This greatly increases the model size. Were the published results achieved using the smaller or bigger model?