Open maximilianmordig opened 2 years ago
We mainly use the sequencing summary to infer the timing between reads on each channel. This information is present in the fast5s as well, but parsing through every fast5 file takes much much longer than reading one text file. We also use the template start and duration in order to trim the adapter sequence and any noisy signal from each end of the reads. The ReadUntil API is able to do this in real-time, and the sequencing summary was the best/easiest way I could find to mimic that behavior. So, you are correct that it should be possible to simulate without a sequencing summary, but it would take some effort to work around those issues.
Some example sequencing summaries from human and a mock microbial community are available here: https://labshare.cshl.edu/shares/schatzlab/www-data/UNCALLED/simulator_files/
Hi @skovaka Thank you for developing UNCALLED.
I am wondering how to generate the sequence_summary file that is necessary to run the "uncalled sim" command as described in the README:
/path/to/control/fast5s --ctl-seqsum /path/to/control/sequencing_summary.txt
. These files don't seem to be provided. So I have downloaded some E. coli fast5 raw reads, but they unfortunately don't come with the sequencing_summary.txt. To my understanding, the control fast5 files are only used to have the fast5 raw signal in the simulation, so I am also wondering why it relies on fields such astemplate_duration
which is basecaller specific.Thank you.