Closed jkkummerfeld closed 5 years ago
Actually, after further thought, that last comment is wrong - the idea I had doesn't completely avoid the possibility of a stale weight being used in inference.
The VoCRF code isn't public yet - sorry about the stale link. I'll have it up this week.
I've run into the stale weight problem before. Actually, I ran into it without parallelism! In my case, I was avoiding allocating a sparse gradient vector (for efficiency) by directly updating the lazy weights as I run the backward pass to compute gradients, but this results slightly wrong updates.
Thanks for the pointer! I'd gladly accept a patch, BTW :-)
I'm not too concerned with parallelism since people rarely do parallel stuff in Python thanks to the GIL. The stale weight problem that I mentioned above, does concern me, however. A similar trick to the one you can probably fix that one too.
Will you be at EMNLP?
Ah, the GIL point is a good one, probably not worth the change then.
As for stale weights, in my experience in this case it's not an issue. It can only come up when two threads are doing the lazy update for the same weight at the same time, and one saves the value that is used to avoid double updates, but then the other exits and returns the current weight before the first does the next instruction (saving the weight).
I won't be at EMNLP this year, which is a shame - it looks like a great program!
Hi, I came across this after reading your EMNLP paper and looking for the related code (vocrf seems to be private?). Thought you might be interested in a slight tweak that I added when I wrote about this in my thesis (https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-138.pdf page 87 / 101 counting front matter). Essentially, I modified the update to be able to do multi-threaded lockless processing of batches, at the cost of potentially running part of inference with stale weights (very rarely though, so shouldn't be a big deal). The idea is very similar to Hogwild.
Thinking about this again, even that risk could be avoided by reading the variables in the correct order... hmmm, I can't amend my thesis, but I might write this up and put a note on my website :)
Jonathan