Closed complexgenome closed 3 years ago
Hi Sariya,
Thank you for your interest in our tool.
The --common
option is used to extract common regions to all samples. Therefore it is highly unlikely that you will have a common region to all your 4K samples.
Maybe you could try to loop over all possible pairs of samples using two for loops in bash.
I am not sure to understand what results you want to obtain but I would be glad to help you if you tell me some precisions about it.
Best, Mathieu
Dear mat,
I am interested to obtain overlapping ROH regions across the individuals. (similar to PLINK --consensus flag).
Dear Sanjeev,
I added the option --vcflist to be able to have multiple vcfs from a text file listing them. I could not find PLINK --consensus flag. Did you mean --consensus-match? Could you tell me the desired output?
Best, Mathieu
On Fri, 28 May 2021 at 13:41, Sanjeev @.***> wrote:
Dear mat,
I am interested to obtain overlapping ROH regions across the individuals. (similar to PLINK --consensus flag).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mquinodo/AutoMap/issues/5#issuecomment-850358647, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALFFYMSWY37ROSTFWXAZBKDTP56N5ANCNFSM45U5V7PQ .
Hi Mat,
Did you mean --consensus-match?
I am looking for this or an akin option in automap provided through PLINK software.
I use --homozyg group --pool-size 10
to get homozygous region found in multiple samples. These parameters here look for consensus homozygous regions, where at least 10 individuals contain the genome homozygous stretch.
Please see attached sample output file. sample_output.txt
In the attached file, the first three columns are: pool (group), family ids and individuals IDs. Within each group there are CON and UNION, that is, consensus and union. These are calculated based on SNP1 SNP2 BP1 BP2
values
The attached output is for homozygous region of length 5KB or more; KB
column.
thank you,
Hi Sanjeev,
For the --homozyg-group command, PLINK is looking at the genotypes to established common haplotypes. This is possible for SNP-array data in which they are multplie SNPs covered in each ROHs. However with exome data, the output would be not reliable for small and medium size ROHs due to the low number of variants present in each ROHs. Furthermore more, VCF files only provide information about non-reference variants and does not allow to infere if a variant no present in the VCF file, is WT or is not covered by the sequencing.
Best, Mathieu
hi there,
Thanks for this tool. I am interested in using this tool on a WES data. I have two cohort of 4K and 11K samples.
I have a VCF file per CHR comprising all these individuals. I see that
--common
option cannot be used with--multivcf
flag.I use
It generates VCF per sample. With 22 CHRs I will have 22 times 4K VCFs.
Next, I would like to get common ROHs from these.
Is there a way provide list of VCFs in a file? I do not think it is fun to provide a list of 4K/11K VCFs in a bash string.
Let me know if you need any help with code/structuring or testing this. best,