Quastions about the datasets

The tool is powerful, thanks for developing it! I have several questions:

If I train the model with the E. coli dataset, I can see that the training set has 2000 reads with about 15M bases in total. I understand that the batch_number correponds to the tfrecord file. It seems that this file has ~300,000 lines, do I understand correctly? I am training with the default parameter setting, batch_size is set to 300, in such a case, one epoch should contain 300,000/300=~1000 steps. My question is: what is this tfrecord file exactly and how does it correspond to the reads and bases?
- Is the eval dataset used for validating during training with -v, or it is used to evaluate the model after the training is done?? I can see from the article, table 4, the testing data of E. Coli contains ~15,000 reads, but the eval dataset of E. coli only has 2000 reads. If the eval dataset is used with -v during training, where can I find the testing set for E. coli wth all 15,000 reads?

Thank you very much~

haotianteng / Chiron