nanoporetech / bonito

A PyTorch Basecaller for Oxford Nanopore Reads
https://nanoporetech.com/
Other
389 stars 120 forks source link

some details about ctc crf model #232

Open xiongjun19 opened 2 years ago

xiongjun19 commented 2 years ago

Hi Dear! I'm a little confused about the code in the ctc crf model, I have four questions:

  1. why 1024 * 5 is needed for the last linear layer?
  2. is the ctc-crf means the CAT?
  3. why the self.idx designed like this: tensor([[ 0, 0, 256, 512, 768], [ 1, 0, 256, 512, 768], [ 2, 0, 256, 512, 768], ..., [1021, 255, 511, 767, 1023], [1022, 255, 511, 767, 1023], [1023, 255, 511, 767, 1023]], dtype=torch.int32)
  4. how is the input signal is grouped: in data 9.4.1; for exmple suppose the sequece is [x1, x2, x3, x4, x5, x6, ..., xN]: one way is: [(x1, x2, x3, x4, x5), (x6, x7, x8, x9, x10), ....]; another way is: (x1, ..., x5), (x2, ..., x6), ...., [];

it would be very helpful if you give me some insights about this! thanks you guys open this great repository, I have learn many things from it.

YuhaoTan2 commented 2 years ago

I guess the CTC-CRF model is an extension of the flipflop model, and it may not be related to CAT. The outputs of the last linear layer are scores for six consecutive bases.