Open fengxiuming opened 1 year ago
the code is: < ~/QUILT/project/QUILT/QUILT_prepare_reference.R \ --outputdir=quilt_output \ --chr=chr6 \ --nGen=100 \ --reference_haplotype_file=oneKG.hap.gz \ --reference_legend_file=oneKG.legend.gz \ --genetic_map_file=~/QUILT/database/lifeover/CEU/CEU-chr6-final.b38.txt.gz \ --regionStart=25000000 \ --regionEnd=34000000 \ --buffer=10000>
Sorry, I do not why the code was crossed
Hi @rwdavies, I have encountered a similar error:
The reference legend file test.legend.gz column 2 needs to be sorted on position with increasing positions between rows but row number 231 has position 176121 and row number 232 has position 176121
In my case, the error is related to the reference legend file (test.legend.gz
), specifically mentioning an issue with a duplicate position for a multi-allelic SNP.
The legend file contains entries like:
11:176121_A_G 176121 A G
11:176121_AC_A 176121 AC A
It seems that the tri-allelic SNPs have been recoded as bi-allelic SNPs, causing a duplicate position in the legend file. What would you suggest is the best approach for these cases? I was thinking of simply removing all duplicated coordinates in the VCF, but perhaps there is a better solution.
Thank you!
Yes apologies, at this moment there is no better way to handle it than either remote the site entirely, or choose only a single variant to consider. It would be possible to re-write the code to handle this case, but I sadly don't have the bandwidth to make that change
No worries! Thank you for your prompt reply!
Hello, I have problem about the sorted and duplicated position when conduct QUILT_prepare_reference.R My code is: (according to UILT_hla_reference_panel_construction.Md) oneKG_vcf_name=CCDG_14151_B01_GRM_WGS_2020-08-05_chr6.filtered.shapeit2-duohmm-phased.vcf.gz
bcftools view --output-file oneKG234.temp.vcf.gz --output-type z --min-alleles 2 --max-alleles 2 --types snps ${oneKG_vcf_name} chr6:25000000-34000000
bcftools convert --haplegendsample oneKG oneKG234.temp.vcf.gz
~/QUILT/project/QUILT/QUILT_prepare_reference.R \ --outputdir=quilt_output \ --chr=chr6 \ --nGen=100 \ --reference_haplotype_file=oneKG.hap.gz \ --reference_legend_file=oneKG.legend.gz \ --genetic_map_file=~/QUILT/database/lifeover/CEU/CEU-chr6-final.b38.txt.gz \ --regionStart=25000000 \ --regionEnd=34000000 \ --buffer=10000
Error in validate_pos(decoy_legend, chr = chr, stop_file_name = paste0("The reference legend file ", : The reference legend file oneKG.legend.gz column 2 needs to be sorted on position with increasing positions between rows but row number 75 has position 25003153 and row number 76 has position 25003153
And I also tried to sort and remove duplicated position according to bcftool.html
bcftools sort oneKG.temp.vcf.gz --output sorted.vcf.gz bcftools norm sorted.vcf.gz -d none --output filter.vcf.gz
And the QUILT_prepare_reference.R still failed to prepare reference because: column 2 needs to be sorted on position with increasing positions between rows but row number 75 has position 25003153 and row number 76 has position 25003153.
What should I do? I would be very grateful if you could help me.
Of course, I have downloaded the reference data from https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.html which have available data about the reference data about hap, legend, sample file. But the error still exist.