starskyzheng / panpop

Application of pan-genome for population
MIT License
94 stars 8 forks source link

PART stand alone clarification #15

Closed Riccardo1274 closed 8 months ago

Riccardo1274 commented 8 months ago

Dear Zeyu Zheng, thank you for developing such a good merging tool. I am trying to use the PART stand alone to first merge between different variant callers but same sample, and later to merge the output vcfs from this first part to get a multi-sample vcf. From your paper, 'The PART method is applied twice to merge SVs from different callers, either for individual analysis or for multiple individuals within large populations'. I cannot find a way to input more than one vcf to the stand alone PART. I looked at your example on Ocean, and here it seems to me that you have provided an already merged (with bcftools merge) vcf file. So, I guess I should merge different samples with bcftools merge, but how do I merge between different variant callers and same sample? How do you suggest to proceed? Thanks in advance. Kind regards, Riccardo Rossi

starskyzheng commented 8 months ago

Hi, Rossi. Ths input file of "PART stand alone mode" must be one VCF file with multiple samples. If multiple VCF needs to be processed, such as from "different variant callers but same sample", you need to merge all those VCFs into one VCF file firstly. Each samples in this VCFs represents result of one variant callers. This process can be done by using bcftools merge -m none -o OUT.vcf.gz IN1.vcf.gz IN2.vcf.gz .... Noted that the parameter of -m none is necessary to avoid the extra process of bcftools. If you need to merge mulitple samles with mulitple variant callers, you need to merge variant callers for each samples separately. Then, you can merge SVs of multiple samples.

Riccardo1274 commented 8 months ago

Thank you very much