Open tanger-code opened 1 month ago
Although vg sim
can run with long read input, it's really designed for short reads. If you use it to generate long reads, you won't get very realistic errors or a realistic read length distribution. In our own testing and development, we've used pbsim
to simulate long reads. You would probably want to generate the reads from FASTAs of sample haplotypes, rather than directly from the GBZ file.
Thank you!
And can I use vg sim
and the .gbz
file to generate short reads using vg sim -x graph.xg **-g graph.gbz** -m SAMPLE -n 1000 -l 150 -a > SAMPLE.gam
?
Now I have the .gbz
file of all chromosomes pangenome
graph. And I want to generate short reads only for chr21
. Do I need to withdraw the .gbz file of chr21? I don't find Related command.
Although
vg sim
can run with long read input, it's really designed for short reads. If you use it to generate long reads, you won't get very realistic errors or a realistic read length distribution. In our own testing and development, we've usedpbsim
to simulate long reads. You would probably want to generate the reads from FASTAs of sample haplotypes, rather than directly from the GBZ file.
I'm simulating long reads using pbsim3
and the output is .maf
file. If I want to do some simulation experiment such as calling SV based on the simulation reads, can I use the maf
file as the truth set
? Or use some public truth set?
Do you have any suggestions?
Looking through our script, it seems that we used the maf2sam
subcommand of bioconvert
.
@tanger-code If you want to simulate from just one named path in the graph, you can use the -P
option to vg sim
.
But that simulates from just that path; it won't include variants in the graph that leave the embedded path.
I don't think we have a way to simulate from the connected component of the graph that contains a path, other than using vg chunk --components -p name-of-path
to pull out that subgraph and then simulating from it.
Hi.
Now I have the
.gbz
graph file, and I want to simulate the third-generationslong reads
data from a pangenome graph. Can VG simulate the third-generations long reads? Or if there is some methods to do this?Any advice would be very helpful to me. Thanks.