speciationgenomics / scripts

Scripts for analysis used during the course
81 stars 62 forks source link

chrom ID #1

Closed maernster closed 3 years ago

maernster commented 5 years ago

Hi there!

I would like to use to use convertvcf2eigenstrat.sh to test hybridization using ADMIXTOOLS. The problem is, that my chromosome IDs don't refer to actual chromosomes but to scaffolds.

I have managed to overcome the issues with plink by adding scaff_ to the beggining of the chrom IDs and changing the original command to: plink --vcf $file.vcf --recode --allow-extra-chr --set-missing-var-ids @:# --out $file

However, I still run into problems with convertf which apparently also arise from my "weird" chrom IDs. I managed to make it work but only by changing all chromosome names to 1, resorting and removing all duplicates. I would rather not do that as I am not sure if changing the order messes with my downstream analyses so I wanted to ask if you know any workaround?

Thanks in advance! maernster

markravinet commented 5 years ago

Hi,

There should be no downstream issues provided all the SNP positions are unique. There are only problems if for example you have position chr1 2200 and say chr25 2200. So I’m guessing that is your issue.

The simplest solution is to rename each chromosome - i.e. chr1 = 1, chr2 = 2 and so on. Since convertf is expecting human data, you will run into issues with non-human chromosome conventions - i.e. in birds chrZ won’t work. It’s best to rename them to numerics.

Hopefully doing that should solve the issue.

Cheers

Mark

On 23 Aug 2019, at 17:22, maernster notifications@github.com wrote:

Hi there!

I would like to use to use convertvcf2eigenstrat.sh to test hybridization using ADMIXTOOLS. The problem is, that I am working with data from non-model organisms meaning that my chromosome IDs don't refer to actual chromosomes but to scaffolds.

I have managed to overcome the issues with plink by adding scaff_ to the beggining of the chrom IDs and changing the original command to: plink --vcf $file.vcf --recode --allow-extra-chr --set-missing-var-ids @:# --out $file

However, I still run into problems with convertf which apparently also arise from my "weird" chrom IDs. I managed to make it work but only by changing all chromosome names to 1 and removing all duplicates. I would rather not do that as I am not sure if changing the order messes with my downstream analyses so I wanted to ask if you know any workaround?

Thanks in advance! maernster

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/speciationgenomics/scripts/issues/1?email_source=notifications&email_token=ABLZHM5ZSD3TWHRLOQJ6PTLQF7XDLA5CNFSM4IPAFXUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HHBYGXQ, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLZHM4SAPCVRCROQJ5YVLLQF7XDLANCNFSM4IPAFXUA.

maernster commented 5 years ago

I have thousands of scaffolds. As a result, if I change their ID into numerics I get an issue with chromosome 100. I guess that's the upper limit set by convertf. Thus, I decided to change them all into 1, but then, as you say, I have duplicates which I have to delete :/ That sucks

Thanks for your answer!

markravinet commented 5 years ago

The probability of lots of duplicates is fairly low though. Maybe try clustering your scaffolds into 100 groups. That would make it even less likely you have duplicates

On Sat, 24 Aug 2019, 14:28 maernster, notifications@github.com wrote:

I have thousands of scaffolds. As a result, if I change their ID into numerics I get an issue with chromosome 100. I guess that's the upper limit set by convertf. Thus, I decided to change them all into 1, but then, as you say, I have duplicates which I have to delete :/ That sucks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/speciationgenomics/scripts/issues/1?email_source=notifications&email_token=ABLZHM4YZ7GJTGYJ5DU3ZD3QGELMDA5CNFSM4IPAFXUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5B6HPA#issuecomment-524542908, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLZHMYAU67UNU5ULB47PKLQGELMDANCNFSM4IPAFXUA .