Closed philip30 closed 5 years ago
Hey @philip30 , I was wondering if it would be possible to separate this out into the code that's WIP and the code that's ready to be used and only send a PR for the latter? I think at this stage in the project it would be best to keep the master branch clean from unfinished code as much as possible because it might be confusing and difficult to maintain for others. I understand that you want to avoid merge conflicts of course. I have recently followed a strategy of keeping all my working code in separate files, and only make PRs for my changes to the files that are already in the master branch. That has worked pretty well for me and avoids most merge conflicts.
Hey @msperber, thank you for you suggestion. But this is actually working code, and I believe @armatthews is going to implement some RNNG as well? Because I already did that, it'd be best if we make a common ground by merging the working code to master.
Ok, if this is all functional code we should work toward merging it. Based on some quick skimming, a couple of comments:
sent.Lattice
which looks very similar. It would be better to merge the two, either by renaming lattice to graph in case they share the same functionality, or by making lattice a subclass of graph or something. In any case, graph should be a subclass of sent.Sentence
and implement the interface accordingly.I have also already implemented RNNG. See my syntax branch. My implementation is mostly done, and it's trainable, though we can't decode with it yet.
I'm happy to work towards a merge, but I agree with keeping unfinished stuff in the branches.
@armatthews I can't see your RNNG in the syntax branch. Maybe you haven't pushed it yet? The latest commit was the part when you read in the syntax tree. It is good that we have two different types of reader, (yours reading the syntax tree while mine reading the CONLL format)
My RNNG code is only on my own branch. This link should take you directly to the RnngDecoder implementation.
It looks like your implementation of RNNGs is for dependency trees whereas mine uses the phrase-structure tree formulation, so they may be complimentary, though we might need to clean up the names a bit.
I like that you generalized things by allowing general (hyper-)graphs. That seems like it could unify several of these different syntax (or other!) structures together.
Aside from the RNNG stuff, I see that you added a last_model_scores
member to the scorer, which concerns me a bit. In general, I think that we should try to keep objects stateless so that things don't get tricky when operating in parallel or on non-linear structures. Is it possible to move the last_model_scores into the SimultaneousTranslator class, rather than making it part of the scorer?
Having the simultaneous translator model is really nice. It seems you also put a lot of work into improving the RL training, which I hope I can put to good use soon :)
@msperber I don't agree with what you said that graph should implement the sentence interface. Here, the graph is just an intermediate representation of the CONLL input (which is list of lines), and the form I need is just a sequence of {shift, nt, reduce} action, a much simpler version of the graph. So I'd implement RNNGActionSentence instead of the GraphSentence.
Also, my graph implementation is based on the list of edges, whereas your lattice implementation is based on direct DAG implementation from object pointing to other objects? I think they are not very compatible, unless we want to turn the lattice implementation into the list of edges implementation.
@armatthews I take a look of your implementation and it is very good. Maybe I can try to merge the work together by pulling your branch and work it on my branch and push it here? We can, of course, talk about the way of merging it.
I understand your point @philip30, and also think that it would be good to implement RNNGActionSentence.
The main reason why I think it would be great to unify the data structures is that there are many ways to use graphs in neural models by now, and it would be cool to be able to reuse data structures, input readers, etc. across different places. For example, the currently implemented LatticeLSTM is actually just a TreeLSTM that is bidirectional and happens to support lattices as inputs. By unifying our data structures, we would immediately have support for tree2seq models in XNMT. The TreeLSTM doesn't support hypergraphs, but that could also be added in the future if needed.
It would be perfectly fine to adjust the currently implemented lattice data structure though. The only requirements are:
I don't know much about the requirements on your part, but do you think it would be possible to unify things?
A positive side effect of merging our code would also be that you could plot trees using the code I already implemented: https://github.com/neulab/xnmt/blob/master/xnmt/sent.py#L545
Alright, I will be working on it :)
@msperber I already merged the Lattice with the graph interface. Can you take a look at the design? Also, I still have an error in the topological sort part, in the transducers/lattice.py. I am not sure which part is wrong? I guess the error comes when it tries to topo sort the reversed lattice?
But I have made sure that the graph operations are valid (as seen in the unittest). Your help is much appreciated! Thank you
I hacked and fixed it :), I think it's important to store the topo_sort results first before using that.
Cool, this looks very nice! So in the end we might have a CoNLLTreeReader
that outputs either GraphSentence
or RNNGActionSentence
objects, and a LatticeReader
that outputs LatticeSentence
objects?
Actually, it might be possible to get rid of the special LatticeSentence
class. I think the main reason for it is to be able to reverse the scores for each node whenever the graph is reversed, but we could simply implement HyperNode.reverse()
that returns self
by default, or returns a node with reversed probs in the case of LatticeNode
. When reversing the graph, it would then simply add a call to reverse()
for each node.
BTW I also have some more graph-related code which I want to merge at some point (though probably not within the next couple of weeks), so I'll also take a look at whether things can be further simplified after this PR has been merged.
@msperber Thanks! I have a question.
Actually, it might be possible to get rid of the special LatticeSentence class. I think the main reason for it is to be able to reverse the scores for each node whenever the graph is reversed, but we could simply implement HyperNode.reverse() that returns self by default, or returns a node with reversed probs in the case of LatticeNode. When reversing the graph, it would then simply add a call to reverse() for each node.
Would this be over-simplify things? So you don't care about the direction of the edge? I think it is also a bit strange to have a reverse() operation in a graph...
Hm, no I didn't mean to get rid of the reverse operation, but it's currently implemented in GraphSentence and then overridden in LatticeSentence, and I was wondering if we could get rid of the latter class altogether.
The above was just a suggestion, but I think if you'd prefer it'd also be fine to keep both GraphSentence and LatticeSentence classes for now, as this should already be enough to support encoding trees (in the form of GraphSentence) using the LatticeLSTM transducer.
@msperber I get rid of the classes and did revision based on your suggestion. I hoped it looks fine now? After this, I will pull the code from @armatthews repo and try to merge the RNNG works.
Yes, this looks great, thanks!
I am merging right now. Currently, there is a bunch of undocumented code in SimultaneousTranslator, yet it is tested. I will soon follow up with documentation when things are less busy.
These are something that I am working on now. I want to merge this to avoid further code conflict/redundancy in the future.
Basically, I am working on: