redpony / cdec

Decoder, aligner, and model optimizer for statistical machine translation and other structured prediction models based on (mostly) context-free formalisms
http://cdec-decoder.org/
Apache License 2.0
183 stars 77 forks source link

Make lattice format syntax a bit more flexible #32

Open nschneid opened 10 years ago

nschneid commented 10 years ago

The so-called Python Lattice Format (PLF) syntax supported by cdec is more constrained than I realized: unlike the equivalent Python data structure,

Assuming a token with )) never occurs in the data, the first one is easily solved with a sed script. The second one took some Python hackery:

class SingleQuotedString(str):
    '''String whose __repr__() is always in single quotes'''
    def __repr__(self):
        return "'" + repr('"'+str(self))[2:]

...though it would be nice if cdec didn't choke on these.