mgalardini / pyseer

SEER, reimplemented in python 🐍🔮
http://pyseer.readthedocs.io
Apache License 2.0
104 stars 25 forks source link

How to run pyseer with vcf files #204

Closed sekhwal closed 2 years ago

sekhwal commented 2 years ago

Hi, I am trying to run pyseer with a dataset of 500 isolates of Salmonella. I have generated vcf files using snippy and merged them using bcftools. Please let me know if I can use pyseer for GWAS analysis. I am following the tutorial but not sure if a structure.tsv is necessary to run the pipeline and how to generate it. Also, I have a phenotype file (.txt) that includes sample names and sample metadata information such as blood, brain, joint (sample collection). Please let me know if I need to convert this information into binary format.

I am using the following command, please let me know if it is correct?

pyseer --phenotypes phenotypes.txt --vcf salm.vcf.gz --min-af 0.01 --max-af 0.99 --cpu 15 >pyseer_results.txt

mgalardini commented 2 years ago

It is recommended to provide a file for population structure correction; in your example which uses the fixed effects model you could provide a matrix obtained with mash (see here)

sekhwal commented 2 years ago

Thank you for your answer. In addition, I am preparing a phenotype file that includes multiple columns with binary values. Please let me know if the pipeline works on multiple columns in the phenotype file. Here is an example. Phenotype.txt (Example)

samples blood brain joint intestine salm11 1 0 1 0 salm12 0 1 0 1 salm37 1 1 1 0

mgalardini commented 2 years ago

Yes, you can run pyseer multiple times by changing the --phenotype-column argument (by default the last column is used)

sekhwal commented 2 years ago

I am thinking if I can provide all the columns at the same time in one file. Please let me know if the pipeline works with multiple columns at one time. I used TASSEL previously where I was able to use multiple columns. So, I am thinking if I can use pyseer with multiple columns.

mgalardini commented 2 years ago

I am afraid you can only run associations one phenotype at a time with pyseer

sekhwal commented 2 years ago

Can you help me suggest any pipeline for microbial gwas that can take multiple columns or is it considerable to use phenotype one by one. I am new in microbial gwas so any suggestion would be appreciated.

Thank you,