stevemussmann / admixturePipeline

A pipeline that accepts a VCF file to run through Admixture
GNU General Public License v3.0
56 stars 19 forks source link

popmap and plink input files #21

Open RvV1979 opened 1 month ago

RvV1979 commented 1 month ago

Hi Steve,

This is a very minor thing but I just wanted to bring to your attention that I noted a possible inconsistency when using a Plink .bed file as input: My popmap file was based on sample IDs as suggested in the documentation (sample --> population). However, that does not work because the pipeline uses column 1 (the family ID) of the .fam file, and also uses the names in the popmap file for the plink--keep-famoption. When the .fam file has identical sample and family IDs (as in your example file) this is not a problem. However, when they are different, the pipeline needs the popmap file to list family --> population.

I found the easiest solution is to ensure that family IDs correspond to sample IDs in the input .fam file in a preprocessing step.

EDIT I now see that this issue was also reported earlier; see #15

stevemussmann commented 1 month ago

Thanks, I didn't look into that earlier issue because the person who opened it also closed it before I had a chance to respond.

I will have to check into this. I recall there being some issue I was working around when I wrote the relevant code (i.e., the try/catch block that yields the error in issue #15).

RvV1979 commented 1 month ago

I believe the issue is that your checkFam function matches the first column of the popmap file (which are sample IDs) with the first column in the .fam file (which lists the family ID; see https://www.cog-genomics.org/plink/1.9/formats#fam) and uses the plink --keep-fam option (which works with family IDs) to to filter individuals listed in the popmap file. Perhaps the most elegant solution might be to match the first column of the popmap with the second column in the .fam file (which is defined as the within-family i.e. sample ID) and use the plink --keep option (which works with sample IDs) to filter individuals listed in the popmap file. HTH