k2-fsa / k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.
https://k2-fsa.github.io/k2
Apache License 2.0
1.11k stars 213 forks source link

Could k2.ctc_graph() be replaced by k2.ctc_topo() + k2.compose() ? #1258

Closed lawlict closed 11 months ago

lawlict commented 11 months ago

Hi, I am new to k2 and wonder whether the two following code blocks are equal. It seems that their figures are the same. However, k2.ctc_graph() is specially writen in C++ instead of using k2.ctc_topo() + k2.compose(). Is there any reason?

s = '''
0 1 1 1
1 2 2 2
2 3 2 2
3 4 -1 -1
4
'''
a_fsa = k2.ctc_topo(max_token=1000, modified=False)
b_fsa = k2.Fsa.from_str(s, acceptor=False)
c_fsa = k2.compose(a_fsa, b_fsa)
c_fsa.draw('c_fsa_compose.png', title='c_fsa')

c_fsa_compose

d_fsa = k2.ctc_graph([[1, 2, 2]], modified=False)[0]
d_fsa.draw('d_fsa.png', title='d_fsa')

d_fsa

csukuangfj commented 11 months ago

if the vocab size is too large, e.g., several hundred or thousand, k2.ctc_graph() is way faster.

lawlict commented 11 months ago

@csukuangfj It seems so. What if I define the CTCGraph class and initialize ctctopo() int \_init__() function?

csukuangfj commented 11 months ago

What if I define the CTCGraph class and initialize ctc_topo() int init() function?

Could you explain it in detail? Sorry that I cannot understand it.

lawlict commented 11 months ago

@csukuangfj Like this:

class CTCGraph:
  def __init__(self, vocab_size):
    super().__init__()
    self.a_fsa = k2.ctc_topo(vocab_size, modified=False)

  def __call__(self, s):
    b_fsa = k2.Fsa.from_str(s, acceptor=False)
    c_fsa = k2.compose(self.a_fsa, b_fsa)
    return c_fsa

Or will k2.compose() also be slow?

csukuangfj commented 11 months ago

Yes, that is feasible. Please find below the similar usage in icefall.

https://github.com/k2-fsa/icefall/blob/master/icefall/graph_compiler.py#L96

lawlict commented 11 months ago

@csukuangfj I get it. Thanks for your nice response.