Open wheaton5 opened 6 years ago
@glennhickey Any thoughts on this? I can provide the data for an example chunk that fails which should be pretty small.
Update: chunking by region instead of by nodes fixed this problem
subprocess.check_call(['vg','chunk','-p',key,'-t',str(args.__threads),'-A','-c','5','-s','1000000','-o',str(args.overlap),'-x',args.xg,'-a',gam_index,'-g','-E',args.gam[0:-4]+".bed"])
where key is the chromosome
Still, if the chunking by nodes does not work for this, the reason should be known and documented.
So my vg genotype command is like
where the args.vg and the args.gam_index are the vg and gam.index for a given chunk output by vg chunk and vg index -t
so the genotyping step is failing in a 5 chunks out of 250 chunks. The other chunks are creating legit vcfs. Now I am testing this on a very downsampled data set with <1 average coverage so that might be an issue. But I did confirm that some of the vcfs created are not empty vcfs.