pangenome / pggb

the pangenome graph builder
https://doi.org/10.1038/s41592-024-02430-3
MIT License
369 stars 40 forks source link

Error following tutorial for PGGB VCF evaluation #225

Closed brettChapman closed 1 year ago

brettChapman commented 2 years ago

Hi I've been following this tutorial to evaluate VCF from PGGB: https://pggb.readthedocs.io/en/latest/rst/tutorials/small_variants_evaluation.html#

I get an error at the rtg vcfeval step, complaining about indexing. My genome is large and requires CSI indexing, so I added -C to tabix to index with CSI, the problem is rtg vcfeval doesn't pick up on the .csi and only looks for .tbi.

If I run with select regions from the whole chromosome extracted out I have no problems. I've been testing with 1Mbp regions around genes of interest. However I intend to evaluate the PGGB graph across whole chromosomes and need something to work on large chromosomes. Is there a possible work around for this? The only other solution I can think of is to split the chromosome up into regions and produce a precision and recall plot for each chromosome in chunks. It would have been nice to have one plot with chromosomes on the x-axis.

Thanks.

AndreaGuarracino commented 1 year ago

I was going to ask you if you have tried to ping rtg team to add this kind of support, but I've just seen that there is no 'Issues' tab in its repository (https://github.com/RealTimeGenomics/rtg-tools).

I am not aware of alternatives that are as comfortable as rtg vcfeval. Have you found something in the meantime?

brettChapman commented 1 year ago

Hi @AndreaGuarracino

I figured I'll try slicing up the VCF into quadrants and try indexing each individually and get precision and recall based on segments of the whole chromosome instead. At the moment I have nucmer running on all iterations of the pangenome which will take a while to complete. It's probably worth me asking the rtg-tools devs to see if they can implement a CSI indexing feature.

I'm trying to get precision and recall because our recent submission of our paper the reviewers wanted some reassurance that the genome graph was accurate. I came across this tutorial and thought this would be the best way to do just that.

brettChapman commented 1 year ago

I've posted to their users group, since they have no issues tab: https://groups.google.com/a/realtimegenomics.com/g/rtg-users

brettChapman commented 1 year ago

They got back to me. There is no intention to add CSI indexing support. However they said there may be a way of combining outputs using vcf2rocplot. I'll take a look at how I might be able to implement that.

AndreaGuarracino commented 1 year ago

I hope to see your updates here (or a link to them).