Closed bellstwohearted closed 3 years ago
The reason is that we break the whole read into chunks, the parameter control the length of the chunk is --sequence_len, default length for DNA is 400 for RNA is 2000, the reason we did this is because Tensorflow 1 use static graph compilation, so it creates RNN cells before the actual run, read in the whole length will run out of the GPU memory.
The tool is powerful, thanks for developing it! I have several questions:
batch_number
correponds to the tfrecord file. It seems that this file has ~300,000 lines, do I understand correctly? I am training with the default parameter setting,batch_size
is set to 300, in such a case, one epoch should contain 300,000/300=~1000 steps. My question is: what is thistfrecord
file exactly and how does it correspond to the reads and bases?-v
, or it is used to evaluate the model after the training is done?? I can see from the article, table 4, the testing data of E. Coli contains ~15,000 reads, but the eval dataset of E. coli only has 2000 reads. If the eval dataset is used with-v
during training, where can I find the testing set for E. coli wth all 15,000 reads?Thank you very much~