Closed marcpaga closed 1 year ago
Hi @marcpaga , Can you take a look at https://github.com/google/deepconsensus/blob/r1.1/docs/generate_examples.md first and see if that helps explain how we create training data? Let us know if there's anything unclear from there.
Hi @marcpaga , I'll close this issue, but feel free to open again if you still have questions.
Thanks for this very interesting piece of work.
I was wondering if there's an available raw sequencing dataset available for training which also contains the true labels.
From your publication I found:
Basically, I am asking if there's a file that indicates what is the true complete sequence for each entry in the raw fastq. If not, should I take the deepconsensus predictions, align them against the HG002 genome and take the reference genome as truth?
Best, Marc