Open ASLeonard opened 2 years ago
Hi!
Thanks for asking! Yes, the index creation step does only support biallelic VCFs for now (since we want to represent each variant using two nodes in the graph), so it would be a good idea to convert the VCF to biallelic before doing anything. This should have been mentioned in the description.
The pipeline for creating indexes is unfortunately not very well tested or documented as for now (I guess you are the first to use it), so I won't be suprised if things won't be straight-forward. However, I'm very happy to help you creating indexes. Feel free to reach out here or to me on email (ivargry@ifi.uio.no) if you want some assistance or run into other problems.
If you want to share the VCF and reference genome you want to create indexes for, I'll be happy to try to create the indexes (will be useful for debugging/trying out the pipeline with other data than we have used until now).
Normalising the vcf helped, but now running into a new error about a recursion limit. There were previously many errors about deletion paths not being correct.
dummy_node_adder 2021-12-08 16:26:15,292 INFO: Ignoring deletion path [914] because ref pos at end is not correct
dummy_node_adder 2021-12-08 16:26:15,292 INFO: Ignoring deletion path [361259, 361261, 916] because ref pos at end is not correct
dummy_node_adder 2021-12-08 16:26:15,292 INFO: Ignoring deletion path [914, 915] because ref pos at end is not correct
Traceback (most recent call last):
...
.../obgraph/mutable_graph.py", line 106, in find_nodes_from_node_that_matches_sequence
result = MutableGraph.find_nodes_from_node_that_matches_sequence(self, possible_next, new_sequence, variant_type, new_nodes_found, all_paths_found)
[Previous line repeated 986 more times]
.../obgraph-0.0.7-py3.8.egg/obgraph/mutable_graph.py", line 86, in find_nodes_from_node_that_matches_sequence
if sequence == "":
RecursionError: maximum recursion depth exceeded in comparison
I'll give it another look, but will prepare the vcf to be shared if you are better able to debug the index creation.
Hi, I've been trying to build my own index bundle to use with kage, but keep encountering errors along the way. I believe the most recent on is due to the vcf input being multi-allelic. The vcf I was using as input is actually made through the pangenie suggested pipeline with AF tags added in after.
The error I'm getting is
So is the best way forward to norm the vcf to be biallelic, or is there a way to handle multi-allelic in kage?
thanks, Alex