jonassibbesen / rpvg

Method for inferring path posterior probabilities and abundances from pangenome graph read alignments
MIT License
47 stars 6 forks source link

Error : void NestedPathAbundanceEstimator::inferPathSubsetAbundance #58

Closed jjuhyunkim closed 11 months ago

jjuhyunkim commented 1 year ago

Hi developers!

I encountered the error below when I tried quantification using rpvg with a graph generated from minigraph-cactus, which has complex regions including some loops or multiple nodes, or loops spanning some genes as described in the vg tools pull request.

Running rpvg (commit: 301f553412a7f3b3c3dccad74e845868da4f0468)
Random number generator seed: 1701051741
Fragment length distribution parameters found in alignment (mean: 258.484, standard deviation: 61.0281)
Loaded graph, GBWT and r-index (6.69198 seconds, 2.5275 GB)
Fragment length distribution parameters re-estimated from alignment paths (location: 223.174, scale: 70.9671, shape: 1.38375)
Found alignment paths (654.778 seconds, 2.5275 GB)
Clustered alignment paths (0.519952 seconds, 2.5275 GB)
rpvg: /home/rpvg/src/path_abundance_estimator.cpp:715: void NestedPathAbundanceEstimator::inferPathSubsetAbundance(PathClusterEstimates*, const std::vector<ReadPathProbabilities>&, std::mt19937*, const spp::sparse_hash_map<std::vector<unsigned int>, double>&) const: Assertion `path_group.second.size() <= group_size' failed.

However, as far as I know, the haplotype-transcript information table will be only updated if I use the future vg tools that reflect the pull request.

So, I attempted to remove redundant haplotypes in the 4th column of the haplotype-transcript table myself and then rerun rpvg with the fixed table. But I got the error above.

Could you please provide advice on whether this error could be fixed by using the haplotype transcript table updated with future vg tools, or if it might be caused by another issue?

Thank you!

jjuhyunkim commented 11 months ago

I encountered the same error when using the latest version of VG tools(version v1.53.0 "Valmontone"), which includes an upgrade for considering twice-projected transcripts on cyclic haplotypes.

Could you please address this error?


Random number generator seed: 1703019005
Loaded graph, GBWT and r-index (6.00261 seconds, 2.44436 GB)
Found alignment paths (1262.62 seconds, 2.44436 GB)
Clustered alignment paths (0.101424 seconds, 2.44436 GB)
rpvg: /home/rpvg/src/path_abundance_estimator.cpp:715: void NestedPathAbundanceEstimator::inferPathSubsetAbundance(PathClusterEstimates*, const std::vector<ReadPathProbabilities>&, std::mt19937*, const spp::sparse_hash_map<std::vector<unsigned int>, double>&) const: Assertion `path_group.second.size() <= group_size' failed.```
jjuhyunkim commented 11 months ago

I realized that it occurred due to the misuse of alignment types (paired short read vs single long read). I apologize for any confusion caused.

I am going to close this issue.