Closed drewstone closed 5 years ago
When one have the labeled dataset, typically a dataset like this: 1-15 A 15-20 C 20-27 C ...
The first column gives the location of the base(2nd column). So to transform it into the format that Chiron used, accumulate the signal by each base(e.g. here signal point 1-15 15-20 20-27...), until the next base makes the signal longer than 300, and then padding the signal to 300 length.
You can find this part of the code in chiron_input.py, the function read_tfrecord I hope this solves your question.
Hello, I'm confused about training from the paper. You were clear on how you partitioned the input signal data (by sliding windows of length 300 with step sizes of 30) but it was not clear how you partitioned labelings of these signals for outputs.
Can you elaborate on how you gave each length 300 signal segment a label for training? Do you kmer expand the base reading in some fashion? It seems the uppercase
K
used in the paper is never well documented afterwards. I'm also having a hard time finding it in this repo.