kcleal / dysgu

Toolkit for calling structural variants using short or long reads
MIT License
88 stars 10 forks source link

Merging genotype files #47

Closed arunvv90 closed 1 year ago

arunvv90 commented 1 year ago

Hi, I merged multiple samples and used them to genotype each sample. Then I merged the genotyped samples to get a population level VCF. It has variants with a mapping quality of zero and genotype quality zero. These variants are not present in the original sample genotype file.

Chr: NC_001493.2 Position: 63036-63075 ID: 35

Genotype Information Sample: ATCC.dysgu_reads Genotype: T Quality: 0 Type: HOM_REF Is Filtered Out: No

Genotype Attributes MAPQP: 0 SU: 0 PS: 0 BCC: 0 MS: 0 FCC: 0 Genotype Quality: 0 COV: 0 SC: 0 RED: 0 PROB: 0 PE: 0 ICN: 0 NEIGH10: 0 BND: 0 RMS: 0 WR: 0 OCN: 0 SR: 0 How to filter them? Thank you

kcleal commented 1 year ago

Hi @arunvv90, Would you mind sending the whole vcf record as it appears in the vcf file, and also the sequence of commands you used? This would be very helpful, thanks

arunvv90 commented 1 year ago

Dysgu call for each sample dysgu call --mode nanopore -p50 -v2 "${ref_genome}" "$tempdir/${filename}${temp_ext}" "$input_dir/${filename}${ext}"

"$out_dir/${filename}${out_ext1}" Merging samples dysgu merge ./.vcf > dy_poplnmerg.vcf Genotyping with dysgu merged vcf dysgu run --sites "${mer_file}" "${ref_genome}" "$tempdir/${filename}${temp_ext}" "$input_dir/${filename}${ext}" "$out_dir/${filename}${out_ext1}" dysgu merge ./.vcf > dy_genotypepoplnmerg.vcf I have attached the VCF files. Please let me know if you cannot access it.

Arun Venugopalan Ph.D Scholar Infectious Diseases, Basic Science Dept. College of Veterinary Medicine Mississippi State University, USA

On Sun, Jul 24, 2022 at 10:37 AM Kez Cleal @.***> wrote:

Hi @arunvv90 https://github.com/arunvv90, Would you mind sending the whole vcf record as it appears in the vcf file, and also the sequence of commands you used? This would be very helpful, thanks

— Reply to this email directly, view it on GitHub https://github.com/kcleal/dysgu/issues/47#issuecomment-1193342253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ2C6BNPQROZJBE7BAAAHDLVVVPL5ANCNFSM54P2NQIA . You are receiving this because you were mentioned.Message ID: @.***>

kcleal commented 1 year ago

Thanks that is very helpful, the vcf's did'nt appear but please email them to me if you wish clealk@cardiff.ac.uk

arunvv90 commented 1 year ago

Please check the email. I hope you got the files this time.

Arun Venugopalan Ph.D Scholar Infectious Diseases, Basic Science Dept. College of Veterinary Medicine Mississippi State University, USA

On Sun, Jul 24, 2022 at 10:57 AM Kez Cleal @.***> wrote:

Thanks that is very helpful, the vcf's did'nt appear but please email them to me if you wish @.***

— Reply to this email directly, view it on GitHub https://github.com/kcleal/dysgu/issues/47#issuecomment-1193346661, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ2C6BNKD6ERV4W4U7RPGADVVVRX3ANCNFSM54P2NQIA . You are receiving this because you were mentioned.Message ID: @.***>

kcleal commented 1 year ago

I had a review of the vcf, the variant you mention looks like it is present in only one of the input vcf files: NC_001493.2 63036 20 T <DEL> . PASS SVMETHOD=DYSGUv1.3.11;SVTYPE=DEL;END=63075;CHR2=NC_001493.2;GRP=1;NGRP=35;CT=3to5;CIPOS95=10;CIEND95=26;SVLEN=39;GC=64.29;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=0;OL=0;SU=44;WR=22;PE=0;SR=0;SC=0;BND=0;LPREC=1;RT=nanopore;MeanPROB=0.913;MaxPROB=0.913 GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 1/1:35.0:60.0:44:22:0:0:0:0:0.0:9:11:11:0:0:0:-1.0:-1.0:-1.0:0.913 ... The first FORMAT column is all 0, and the sample BCAHV looks like it contains the variant, not sample ATCC?

kcleal commented 1 year ago

Hi @arunvv90, Please could you take a look at my comment, I am unsure what you would like to know from your example? Thanks

arunvv90 commented 1 year ago

Here is the new picture. Maybe I got confused. In the figure top panel(ATCC_dygenotype.vcf) has a deletion(963bp). That is the only variant reported in the sample genotyped file. The bottom panel is the merged vcf. In the first row of ATCC dysgu reads there are small variants reported. For the ATCC sample only 963bp deletion should have been present. These small variants have mapping quality and genotype quality is zero. Those variants are present in the other samples not in the ATCC.

Arun Venugopalan Ph.D Scholar Infectious Diseases, Basic Science Dept. College of Veterinary Medicine Mississippi State University, USA

On Sun, Jul 24, 2022 at 11:43 AM Kez Cleal @.***> wrote:

I had a review of the vcf, the variant you mention looks like it is present in only one of the input vcf files: NC_001493.2 63036 20 T . PASS SVMETHOD=DYSGUv1.3.11;SVTYPE=DEL;END=63075;CHR2=NC_001493.2;GRP=1;NGRP=35;CT=3to5;CIPOS95=10;CIEND95=26;SVLEN=39;GC=64.29;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=0;OL=0;SU=44;WR=22;PE=0;SR=0;SC=0;BND=0;LPREC=1;RT=nanopore;MeanPROB=0.913;MaxPROB=0.913 GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 1/1:35.0:60.0:44:22:0:0:0:0:0.0:9:11:11:0:0:0:-1.0:-1.0:-1.0:0.913 ... The first FORMAT column is all 0, and the sample BCAHV looks like it contains the variant, not sample ATCC?

— Reply to this email directly, view it on GitHub https://github.com/kcleal/dysgu/issues/47#issuecomment-1193354153, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ2C6BPISMHEWE6CEZJZVI3VVVXDHANCNFSM54P2NQIA . You are receiving this because you were mentioned.Message ID: @.***>