grenaud / glactools

command-line tools for the management of genotype likelihoods and allele counts
http://grenaud.github.io/glactools/
GNU General Public License v3.0
28 stars 2 forks source link

epo file format #25

Open xinkaitong opened 2 years ago

xinkaitong commented 2 years ago

hello, grenaud! now, I just have an outgroup vcf file (other mammals not human), could you tell me how can I transform vcf file to epo file?

grenaud commented 2 years ago

That is very tough! What is much easier is to transform that VCF to ACF, and use the specific population as the root/anc. However be wary that you don't really have an ancestor, you have an outgroup :-)

On Fri, Nov 26, 2021 at 4:49 AM xinkaitong @.***> wrote:

hello, grenaud! now, I just have an outgroup vcf file (other mammals not human), could you tell me how can I transform vcf file to epo file?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/grenaud/glactools/issues/25, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQRNIZXIMTP352MCJZF7ITUN37VBANCNFSM5IZYG5KQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

xixifa commented 11 months ago

Hello, If I want to calculate β2 using an outgroup to call substitutions and ancestral alleles, can I just get epo files from outgroup vcf file, without common ancestor? Thank you very much

grenaud commented 11 months ago

Yes! You need to convert VCF to ACF and run replaceancestor I believe. Let me know if that works!

xixifa commented 11 months ago

Thank you. I will try it ! If I use vcfm2acf to convert VCF to ACF , then I use usepopsrootanc to tell the root(outgroup), finally I use the replaceanc, the resulting file is the epo file that I need in betascan2 software? Does this epo file include all the individuals I calculate later?

xixifa commented 11 months ago

Hello, I have another question. Is the file2.acf.gz in https://github.com/grenaud/glactools the result using the glactools with vcfm2acf --onlyGT --fai parameter? I am not sure if using these parameters is correct?

Thanks

xixifa commented 11 months ago

Hello, the file1.acf.gz only have a outgroup individual: image when I use the "usepopsrootanc" parameter, I only tell the outgroup, then the error is as follows image I think it lack ancestral information, it is right? so the file1.acf.gz needs the root and ancestor information?

xixifa commented 11 months ago

Hello, What does "ChimpHumanAncestor " refer to specifically?

Thank you very much

grenaud commented 11 months ago

Thank you. I will try it ! If I use vcfm2acf to convert VCF to ACF , then I use usepopsrootanc to tell the root(outgroup), finally I use the replaceanc, the resulting file is the epo file that I need in betascan2 software? Does this epo file include all the individuals I calculate later?

Would you describe seems like it will work. The EPO is only human+chimp+other great apes and the inferred ancestral states, there is a single human "individual" and that is the reference

grenaud commented 11 months ago

Hello, I have another question. Is the file2.acf.gz in https://github.com/grenaud/glactools the result using the glactools with vcfm2acf --onlyGT --fai parameter? I am not sure if using these parameters is correct?

Thanks

I am not sure how this was generated but acf files can be generated in very different ways.

grenaud commented 11 months ago

There is root and ancestor. In the case of humans, we can think of the chimps as the "root" as all humans are equitistant to the chimp. However there is a chimp-human ancestor, we do not have access to the sequence of its genome as it died out 6 million years ago. However we can make a probabilistic inference of what this genome had in terms of alleles. This is precisely what EPO does.

xixifa commented 11 months ago

Hello, Thank you for your reply!

  1. May I ask if I have only root and ancestor, but no groups like other apes, is it OK to generate the follow file.acf.gz as my epo file for Betascan2? image

  2. After I get the epo file, I get an error adding the index file? How to create this index file? image

  3. The first line of the epo file seems to be different from the example you showed. Is it correct for my epo file? image Thanks

grenaud commented 11 months ago

How did you generate this file? Could you paste the command lines?

xixifa commented 11 months ago

I think I misunderstood your meaning. Is file4.acf.gz in your example (glactools replaceanc file2.acf.gz file3.acf.gz > file4.acf.gz) equal to the 1kg_chr6.AFRsubset.acf.gz in following code ( glactools vcfm2acf --onlyGT --epo all.epo.gz --fai human_g1k_v37.fasta.fai chr6.AFRsubset.recode.vcf > 1kg_chr6.AFRsubset.acf.gz)? These two files are actually equivalent, right?

grenaud commented 11 months ago

file4.acf.gz is just a placeholder name, it could be anything.

xixifa commented 11 months ago

sorry,

  1. glactools replaceanc file2.acf.gz <(glactools usepopsrootanc -u file1.acf.gz chimp ChimpHumanAncestor ) > file4.acf.gz
  2. glactools vcfm2acf --onlyGT --epo all.epo.gz --fai human_g1k_v37.fasta.fai chr6.AFRsubset.recode.vcf > 1kg_chr6.AFRsubset.acf.gz

What I understand is that if there is no epo file, file4 generated according to step 1 and 1kg_chr6.AFRsubset.acf.gz generated using epo in 2 are both acf files containing outgroup information. steps 1 and 2 can generate the same file, and the steps are equivalent, is it right?

grenaud commented 11 months ago

While I encourage people to use file descriptors, they are not great for debugging because you don't know which one fails.

Could you try to break down these commands into three separate ones, produce the output for each and inspect that using glactools view?

xixifa commented 11 months ago

Yes, I used 3 commands to generate the file successfully.

Since my species is not human, I want to know if the output file of 1 is the equivalent of the output file of 2 ?

Thank you very much

grenaud commented 11 months ago

Could you please paste the commands? I am not sure what is file 1 and file 2 here.

xixifa commented 11 months ago

image I mean these two ways above. Since my object is not human, there is no epo file, so I can only use the method of 1. I am not sure whether the file I generated is the input file of betascan2? the following is my output image

grenaud commented 11 months ago

Again, please no file descriptors, just single commands.

xixifa commented 11 months ago

The following is code of my test file. IN1 is outgroup, ZA1 is ancestor. test.chr24.vcf is my pop1 file.

glactools vcfm2acf --onlyGT --fai $ref_genome outgroup_root_test.chr24.vcf > outgroup_root_test.chr24.acf.gz glactools vcfm2acf --onlyGT --fai $ref_genome test.chr24.vcf > test.chr24.acf.gz glactools usepopsrootanc outgroup_root_test.chr24.acf.gz IN1 ZA1 > outgroup_root_test_identify.chr24.acf.gz glactools replaceanc test.chr24.acf.gz outgroup_root_test_identify.chr24.acf.gz > outgroup_root_test_merge.chr24.acf.gz

grenaud commented 11 months ago

ok thank you! which command fails?

xixifa commented 11 months ago

No error reported. I'm just not sure if the outgroup_root_test_merge.chr24.acf.gz is the betascan2 input file?

grenaud commented 11 months ago

ah no it is not! you need to use glactools acf2betascan

xixifa commented 11 months ago

glactools acf2betascan --useroot outgroup_root_test_merge.chr24.acf.gz | gzip > outgroup_root_test.chr24.beta.txt.gz

Is it right to do ?

grenaud commented 11 months ago

should be!

xixifa commented 11 months ago

Ok! Thank you very much !