Generating kmers from a region of the linear PRG involves generating and breaking up genome paths. These paths are generated as Cartesian products of ordered lists of alleles.
If there is a very large number of alleles within a region of the PRG (high density of variant sites) the number of unique genome paths could be extremely large.
The script should be modified so that the size of the regions which are considered for generating genome paths are minimized.
Note: the script performs adequately on the WG dataset but takes much longer on the "prg_24Mb_human_chr21_with_44439_vars_based_on_restricted_ref" (human) dataset. Even though the human dataset contains less variant sites. They must be densely distributed.
Generating kmers from a region of the linear PRG involves generating and breaking up genome paths. These paths are generated as Cartesian products of ordered lists of alleles.
If there is a very large number of alleles within a region of the PRG (high density of variant sites) the number of unique genome paths could be extremely large.
The script should be modified so that the size of the regions which are considered for generating genome paths are minimized.
Note: the script performs adequately on the WG dataset but takes much longer on the "prg_24Mb_human_chr21_with_44439_vars_based_on_restricted_ref" (human) dataset. Even though the human dataset contains less variant sites. They must be densely distributed.