Due to memory locality in the weight vector, this may be better than iterating first over labels. (Inspired by JAMR.)
supersense tagging speed comparison on ~/dev/e/PySupersenseTagger/test-sst/learningtiny (1 iter + dev prediction, 146 labels, 600k weights):
iterating over labels, then percepts (old way):
acc 61.28% (train), 71.03% (dev), 234.4s, 238.0s, 232.0s, avg. 234.8s
iterating over percepts for all o0 features (new way, per-label totals in Python list):
acc 61.28% (train), 71.03% (dev), 208.9s, 227.1s, 222.4s, avg. 219.5s
Due to memory locality in the weight vector, this may be better than iterating first over labels. (Inspired by JAMR.)
supersense tagging speed comparison on ~/dev/e/PySupersenseTagger/test-sst/learningtiny (1 iter + dev prediction, 146 labels, 600k weights):
speedup: (234.8-206.4)/234.8 = 12%