haotianteng / Chiron

A basecaller for Oxford Nanopore Technologies' sequencers
Other
122 stars 53 forks source link

Added label files for chiron output #47

Closed nmiculinic closed 6 years ago

nmiculinic commented 6 years ago

This isn't perfect, but it's good enough for most cases. Now you can use chiron itself to bootstrap new models.

Test plan: I've run it locally with working base commit, since current master is broken (( #46 )). It works like a charm.

nmiculinic commented 6 years ago

The purpose of this diff is adding confidence intervals where each base pair appears on the raw signal by calculating the lower bound. I'm using it to bootstrap the new model ( EM in a way )

haotianteng commented 6 years ago

Hi Neven, Thanks for your PR, but can you separate the commits into two PR? One is "bootstrap a new model" and the other is “speed up for small files base calling”. So I can check and review them separately.

Thanks Teng

2018-04-30 6:06 GMT+10:00 Neven Miculinic notifications@github.com:

@nmiculinic commented on this pull request.

In chiron/chiron_eval.py https://github.com/haotianteng/Chiron/pull/47#discussion_r184893425:

@@ -258,7 +255,7 @@ def evaluation(): l_sz, d_sz = sess.run([logits_queue_size, decode_queue_size])

Flow control

Either we have something beam decoded, or we've pushed all data into the queue

  • pbar.set_postfix(logits_q=l_sz, decoded_q=d_sz)
  • pbar.set_postfix(logits_q=l_sz, decoded_q=d_sz, refresh=False)

if the refresh=True, the queue sizes are permafrozen to zero in the output for some reason, with refresh=False it works. I haven't had time to explore in depth why is that

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haotianteng/Chiron/pull/47#discussion_r184893425, or mute the thread https://github.com/notifications/unsubscribe-auth/AKo3X1L-58xkHSwdhHv8YsaEYtarfKgOks5tth02gaJpZM4Tra9h .

-- Teng Haotian University of Queensland, Queensland, Australia +61 0426116017

nmiculinic commented 6 years ago

They are separated. The #50 is the global one, which is useful for me