multi vcf: how to use common flag and/or, provide list of VCFs with --common

mquinodo / AutoMap

Tool to find regions of homozygosity (ROHs) from sequencing data.

24 stars 9 forks source link

multi vcf: how to use common flag and/or, provide list of VCFs with --common #5

Closed complexgenome closed 3 years ago

complexgenome commented 3 years ago

hi there,

Thanks for this tool. I am interested in using this tool on a WES data. I have two cohort of 4K and 11K samples.

I have a VCF file per CHR comprising all these individuals. I see that --common option cannot be used with --multivcf flag.

I use

bash $AUTOMAP_HOME/AutoMap_v1.0.sh --vcf $VCF_file  --multivcf --out ROH_output/CHR22  --genome hg19

It generates VCF per sample. With 22 CHRs I will have 22 times 4K VCFs.

Next, I would like to get common ROHs from these.

--vcf VCF1,VCF2,VCF3

Is there a way provide list of VCFs in a file? I do not think it is fun to provide a list of 4K/11K VCFs in a bash string.

Let me know if you need any help with code/structuring or testing this. best,

mquinodo commented 3 years ago

Hi Sariya,

Thank you for your interest in our tool. The --common option is used to extract common regions to all samples. Therefore it is highly unlikely that you will have a common region to all your 4K samples. Maybe you could try to loop over all possible pairs of samples using two for loops in bash.

I am not sure to understand what results you want to obtain but I would be glad to help you if you tell me some precisions about it.

Best, Mathieu

complexgenome commented 3 years ago

Dear mat,

I am interested to obtain overlapping ROH regions across the individuals. (similar to PLINK --consensus flag).

mquinodo commented 3 years ago

Dear Sanjeev,

I added the option --vcflist to be able to have multiple vcfs from a text file listing them. I could not find PLINK --consensus flag. Did you mean --consensus-match? Could you tell me the desired output?

Best, Mathieu

On Fri, 28 May 2021 at 13:41, Sanjeev @.***> wrote:

Dear mat,

I am interested to obtain overlapping ROH regions across the individuals. (similar to PLINK --consensus flag).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mquinodo/AutoMap/issues/5#issuecomment-850358647, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALFFYMSWY37ROSTFWXAZBKDTP56N5ANCNFSM45U5V7PQ .

complexgenome commented 3 years ago

Hi Mat,

Did you mean --consensus-match?

I am looking for this or an akin option in automap provided through PLINK software.

I use --homozyg group --pool-size 10 to get homozygous region found in multiple samples. These parameters here look for consensus homozygous regions, where at least 10 individuals contain the genome homozygous stretch.

Please see attached sample output file. sample_output.txt In the attached file, the first three columns are: pool (group), family ids and individuals IDs. Within each group there are CON and UNION, that is, consensus and union. These are calculated based on SNP1 SNP2 BP1 BP2 values The attached output is for homozygous region of length 5KB or more; KB column.

thank you,

mquinodo commented 3 years ago

Hi Sanjeev,

For the --homozyg-group command, PLINK is looking at the genotypes to established common haplotypes. This is possible for SNP-array data in which they are multplie SNPs covered in each ROHs. However with exome data, the output would be not reliable for small and medium size ROHs due to the low number of variants present in each ROHs. Furthermore more, VCF files only provide information about non-reference variants and does not allow to infere if a variant no present in the VCF file, is WT or is not covered by the sequencing.

Best, Mathieu