Open RvV1979 opened 2 hours ago
Good morning,
What kind of inputs are you using? If you are starting from a VCF, then plink should be adding the --allow-extra-chr
automatically. If you are starting from a plink format then that might not yet be the case since that feature is still relatively new.
If you're starting from a plink file then a work-around might be running your file through Plink with the --allow-extra-chr
option before providing it to admixpipe.
-Steve
Ah yes, sorry that was not clear. I was using a PLINK bed file as input. This was already generated by Plink with the --allow-extra-chr
option so that does not solve the issue. In my experience, that option needs to be used every time Plink is used on any file with nonstandard chromosome IDs.
To try the vcf-route, I exported the bed file to vcf-iid format with Plink and used that as input. Then, vcf-query
outputs a list of individuals, and vcftools
outputs a table of individual missingness (this is very low: I have filtered for that in earlier steps). However, for some reason, the next vcftools
step removes all individuals with many --remove-indv
calls. In addition, for each chromosome, I get Unrecognized values used for CHROM: NC_XXXX - Replacing with 0
. So it seems the issue of nonstandard chromosome IDs is not appropriately handled even before calling Plink.
If you have some more ideas I would be much obliged.
Thanks
Thanks for the additional details. It is my understanding that Admixture does not utilize chromosome information (both the Alexander et al. 2009 paper and the Admixture manual state that linkage equilibrium is assumed, so datasets should be filtered for LD prior to running Admixture), and my recollection is that the program itself is very restrictive in what it will allow as chromosome names. Comments on code in this website seem to match my memory (https://speciationgenomics.github.io/ADMIXTURE/). Consequently, I do not retain this information in the Plink conversion because it is counterproductive to running Admixture. If you are receiving those warnings but Admixture itself is running, then the pipeline is working as-intended.
The --remove-indv
calls in vcftools might be happening if some individuals are present in your vcf file that are not present in your population map file.
Hi Steve,
I am analyzing a dataset with nonstandard chromosome IDs (specifically, genbank accession codes) and therefore get
Error: Invalid chromosome code 'NC_XXXX' on line 1 of .bim file
. In my own pipelines, I add--allow-extra-chr
to my plink commands to solve this.Is there a way to have the admixturePipeline work with such nonstandard chromosome IDs?
Thanks