Open liud2 opened 4 years ago
Hi,
This repo is no longer being actively maintained. SVTK is now being maintained as part of the following repo: https://github.com/broadinstitute/gatk-sv https://github.com/broadinstitute/gatk-sv
I’d recommend posting your issue there. Sorry for the inconvenience!
On Sep 23, 2020, at 9:34 AM, liud2 notifications@github.com wrote:
I was able to index the standardized vcf files from Delly using tabix -p vcf, but not able to do so for the vcf files from Manta and Lumpy. The error from tabix points to a coordinate in column 5 of vcf file, which means that the option tabix -p vcf does not work with the *.vcf.gz file from Manta or Lumpy, but Delly. Any suggestions on this issue?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/talkowski-lab/svtk/issues/100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4MDRBKOKW4GPK6UG6MU4DSHH2PTANCNFSM4RXBP3CA.
Hi Ryan,
Thanks for informing me the new repo. Do you know whether gatk-sv can be run on a computing cluster? I have just completed vcfcluster step for the vcf files from Manta, Delly and Lumpy with the svtk.
I took a quick look the gatk-sv data flow chart, it seems that the sr-test, pe-test, rd-test, and baf-test proposed in your Nature paper are excluded from the new gatk-sv chart, is it correct?
Delong Liu
From: Ryan L. Collins notifications@github.com Sent: Friday, September 25, 2020 9:49 AM To: talkowski-lab/svtk svtk@noreply.github.com Cc: Liu, Delong (NIH/NHLBI) [E] liud2@mail.nih.gov; Author author@noreply.github.com Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi,
This repo is no longer being actively maintained. SVTK is now being maintained as part of the following repo: https://github.com/broadinstitute/gatk-sv https://github.com/broadinstitute/gatk-sv
I’d recommend posting your issue there. Sorry for the inconvenience!
On Sep 23, 2020, at 9:34 AM, liud2 notifications@github.com<mailto:notifications@github.com> wrote:
I was able to index the standardized vcf files from Delly using tabix -p vcf, but not able to do so for the vcf files from Manta and Lumpy. The error from tabix points to a coordinate in column 5 of vcf file, which means that the option tabix -p vcf does not work with the *.vcf.gz file from Manta or Lumpy, but Delly. Any suggestions on this issue?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/talkowski-lab/svtk/issues/100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4MDRBKOKW4GPK6UG6MU4DSHH2PTANCNFSM4RXBP3CA.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/talkowski-lab/svtk/issues/100#issuecomment-698939725, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADLYYS7OL4WF7PHH7YHKE5DSHSNV5ANCNFSM4RXBP3CA.
Hi Delong,
Thanks for your interest in running GATK-SV!
GATK-SV was designed as a cloud-based pipeline, so we typically haven’t encouraged deploying GATK-SV on local computing clusters.
That said, it’s not impossible to run on a local cluster, but you will need to have a running instance of Cromwell and the appropriate permissions to use Docker (many institutional clusters don’t allow this).
Regarding sr-test, pe-test, rd-test, and baf-test: no, all four are still included as part of GATK-SV during module 02, so those still need to be run.
Hope this helps,
Ryan
On Sep 25, 2020, at 3:02 PM, liud2 notifications@github.com wrote:
Hi Ryan,
Thanks for informing me the new repo. Do you know whether gatk-sv can be run on a computing cluster? I have just completed vcfcluster step for the vcf files from Manta, Delly and Lumpy with the svtk.
I took a quick look the gatk-sv data flow chart, it seems that the sr-test, pe-test, rd-test, and baf-test proposed in your Nature paper are excluded from the new gatk-sv chart, is it correct?
Delong Liu
From: Ryan L. Collins notifications@github.com Sent: Friday, September 25, 2020 9:49 AM To: talkowski-lab/svtk svtk@noreply.github.com Cc: Liu, Delong (NIH/NHLBI) [E] liud2@mail.nih.gov; Author author@noreply.github.com Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi,
This repo is no longer being actively maintained. SVTK is now being maintained as part of the following repo: https://github.com/broadinstitute/gatk-sv https://github.com/broadinstitute/gatk-sv
I’d recommend posting your issue there. Sorry for the inconvenience!
On Sep 23, 2020, at 9:34 AM, liud2 notifications@github.com<mailto:notifications@github.com> wrote:
I was able to index the standardized vcf files from Delly using tabix -p vcf, but not able to do so for the vcf files from Manta and Lumpy. The error from tabix points to a coordinate in column 5 of vcf file, which means that the option tabix -p vcf does not work with the *.vcf.gz file from Manta or Lumpy, but Delly. Any suggestions on this issue?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/talkowski-lab/svtk/issues/100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4MDRBKOKW4GPK6UG6MU4DSHH2PTANCNFSM4RXBP3CA.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/talkowski-lab/svtk/issues/100#issuecomment-698939725, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADLYYS7OL4WF7PHH7YHKE5DSHSNV5ANCNFSM4RXBP3CA. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/talkowski-lab/svtk/issues/100#issuecomment-699100782, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4MDRGIUFUONFYSWDDD4PLSHTSKZANCNFSM4RXBP3CA.
Hi Ryan,
I will consult our IT scientists about if they could install gatk-sv on NIH biowulf cluster, which is no charge for us to use. Including svtk to gatk package is a great move.
Thank you and your team for publishing the Nature paper in 5/2020 and making the gnomAD SV database available to the public!
Delong
From: Ryan L. Collins notifications@github.com Sent: Saturday, September 26, 2020 8:33 AM To: talkowski-lab/svtk svtk@noreply.github.com Cc: Liu, Delong (NIH/NHLBI) [E] liud2@mail.nih.gov; Author author@noreply.github.com Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi Delong,
Thanks for your interest in running GATK-SV!
GATK-SV was designed as a cloud-based pipeline, so we typically haven’t encouraged deploying GATK-SV on local computing clusters.
That said, it’s not impossible to run on a local cluster, but you will need to have a running instance of Cromwell and the appropriate permissions to use Docker (many institutional clusters don’t allow this).
Regarding sr-test, pe-test, rd-test, and baf-test: no, all four are still included as part of GATK-SV during module 02, so those still need to be run.
Hope this helps,
Ryan
On Sep 25, 2020, at 3:02 PM, liud2 notifications@github.com<mailto:notifications@github.com> wrote:
Hi Ryan,
Thanks for informing me the new repo. Do you know whether gatk-sv can be run on a computing cluster? I have just completed vcfcluster step for the vcf files from Manta, Delly and Lumpy with the svtk.
I took a quick look the gatk-sv data flow chart, it seems that the sr-test, pe-test, rd-test, and baf-test proposed in your Nature paper are excluded from the new gatk-sv chart, is it correct?
Delong Liu
From: Ryan L. Collins notifications@github.com<mailto:notifications@github.com> Sent: Friday, September 25, 2020 9:49 AM To: talkowski-lab/svtk svtk@noreply.github.com<mailto:svtk@noreply.github.com> Cc: Liu, Delong (NIH/NHLBI) [E] liud2@mail.nih.gov<mailto:liud2@mail.nih.gov>; Author author@noreply.github.com<mailto:author@noreply.github.com> Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi,
This repo is no longer being actively maintained. SVTK is now being maintained as part of the following repo: https://github.com/broadinstitute/gatk-sv https://github.com/broadinstitute/gatk-sv
I’d recommend posting your issue there. Sorry for the inconvenience!
On Sep 23, 2020, at 9:34 AM, liud2 notifications@github.com<mailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com>> wrote:
I was able to index the standardized vcf files from Delly using tabix -p vcf, but not able to do so for the vcf files from Manta and Lumpy. The error from tabix points to a coordinate in column 5 of vcf file, which means that the option tabix -p vcf does not work with the *.vcf.gz file from Manta or Lumpy, but Delly. Any suggestions on this issue?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/talkowski-lab/svtk/issues/100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4MDRBKOKW4GPK6UG6MU4DSHH2PTANCNFSM4RXBP3CA.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/talkowski-lab/svtk/issues/100#issuecomment-698939725, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADLYYS7OL4WF7PHH7YHKE5DSHSNV5ANCNFSM4RXBP3CA. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/talkowski-lab/svtk/issues/100#issuecomment-699100782, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4MDRGIUFUONFYSWDDD4PLSHTSKZANCNFSM4RXBP3CA.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/talkowski-lab/svtk/issues/100#issuecomment-699489806, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADLYYS6CG3EC3A7PJIEGW3TSHXNRRANCNFSM4RXBP3CA.
Hi Ryan,
I have to admit that I know little about shell and python programming. I use R on daily basis. I am picking up python and shell programing.
In the stvk 0.1 installed on our clusters, I do not see steps for BAF test, rd-test or variant filtering steps. I do see the python scripts in the directories of https://github.com/talkowski-lab/SV-Adjudicator/tree/master and https://github.com/talkowski-lab/svtk/tree/master/svtk. The directory of SV-Adjudicator contains more details than the svtk directory.
After weeks of struggle, I have completed pe-test and sr-test for vcf files generated from delly, lumpy, manta, and the final vcf merged from SVA, LINE1 and ALU. I was stuck with extracting baf information from vcf files. I have the final calibrated GATK indel-snp.vcf (GATK 3.8.1) from ~ 500 human genomes for each chromosome. I am able to extract GTs of the samples from GATK vcf using bcftools query. I do not know the format of the BAF file required for BAF test.
Could you please point me to the python files in the directory of SV-Adjudicator or svtk to complete the 3 steps:
Given that I may not have access to GATK-sv on our clusters soon, I have to rely on our clusters and svtk 0.1 or the python files in the SV-Adjudicator to complete SV analysis of our data by following the svtk data processing chart on your recent Nature paper.
As I have been walking through our SV data processing, I have fully appreciated the amount of time and efforts you team spent in reliably extracting structural variants from ~15,000 human genomes in your Nature paper. We are going to use gnomAD SV database for our reference later.
I appreciate your help.
Delong Liu
Translational Vascular Medicine Branch NHLBI/NIH Building 10, 8N110 10 Center Drive Bethesda, MD 20887 301-451-3410 (w) Liud2@mail.nih.gov
From: Ryan L. Collins notifications@github.com Sent: Saturday, September 26, 2020 8:33 AM To: talkowski-lab/svtk svtk@noreply.github.com Cc: Liu, Delong (NIH/NHLBI) [E] liud2@mail.nih.gov; Author author@noreply.github.com Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi Delong,
Thanks for your interest in running GATK-SV!
GATK-SV was designed as a cloud-based pipeline, so we typically haven’t encouraged deploying GATK-SV on local computing clusters.
That said, it’s not impossible to run on a local cluster, but you will need to have a running instance of Cromwell and the appropriate permissions to use Docker (many institutional clusters don’t allow this).
Regarding sr-test, pe-test, rd-test, and baf-test: no, all four are still included as part of GATK-SV during module 02, so those still need to be run.
Hope this helps,
Ryan
On Sep 25, 2020, at 3:02 PM, liud2 notifications@github.com<mailto:notifications@github.com> wrote:
Hi Ryan,
Thanks for informing me the new repo. Do you know whether gatk-sv can be run on a computing cluster? I have just completed vcfcluster step for the vcf files from Manta, Delly and Lumpy with the svtk.
I took a quick look the gatk-sv data flow chart, it seems that the sr-test, pe-test, rd-test, and baf-test proposed in your Nature paper are excluded from the new gatk-sv chart, is it correct?
Delong Liu
From: Ryan L. Collins notifications@github.com<mailto:notifications@github.com> Sent: Friday, September 25, 2020 9:49 AM To: talkowski-lab/svtk svtk@noreply.github.com<mailto:svtk@noreply.github.com> Cc: Liu, Delong (NIH/NHLBI) [E] liud2@mail.nih.gov<mailto:liud2@mail.nih.gov>; Author author@noreply.github.com<mailto:author@noreply.github.com> Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi,
This repo is no longer being actively maintained. SVTK is now being maintained as part of the following repo: https://github.com/broadinstitute/gatk-sv https://github.com/broadinstitute/gatk-sv
I’d recommend posting your issue there. Sorry for the inconvenience!
On Sep 23, 2020, at 9:34 AM, liud2 notifications@github.com<mailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com>> wrote:
I was able to index the standardized vcf files from Delly using tabix -p vcf, but not able to do so for the vcf files from Manta and Lumpy. The error from tabix points to a coordinate in column 5 of vcf file, which means that the option tabix -p vcf does not work with the *.vcf.gz file from Manta or Lumpy, but Delly. Any suggestions on this issue?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/talkowski-lab/svtk/issues/100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4MDRBKOKW4GPK6UG6MU4DSHH2PTANCNFSM4RXBP3CA.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/talkowski-lab/svtk/issues/100#issuecomment-698939725, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADLYYS7OL4WF7PHH7YHKE5DSHSNV5ANCNFSM4RXBP3CA. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/talkowski-lab/svtk/issues/100#issuecomment-699100782, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4MDRGIUFUONFYSWDDD4PLSHTSKZANCNFSM4RXBP3CA.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/talkowski-lab/svtk/issues/100#issuecomment-699489806, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADLYYS6CG3EC3A7PJIEGW3TSHXNRRANCNFSM4RXBP3CA.
Hi Delong,
I’d recommend referring to the workflow files (.wdl) in the official GATK-SV repo for examples of these commands: https://github.com/broadinstitute/gatk-sv/tree/master/wdl https://github.com/broadinstitute/gatk-sv/tree/master/wdl
For example, the BAF calculation is documented here: https://github.com/broadinstitute/gatk-sv/blob/master/wdl/BAFFromGVCFs.wdl https://github.com/broadinstitute/gatk-sv/blob/master/wdl/BAFFromGVCFs.wdl https://github.com/broadinstitute/gatk-sv/blob/master/wdl/BAFFromShardedVCF.wdl https://github.com/broadinstitute/gatk-sv/blob/master/wdl/BAFFromShardedVCF.wdl
And the RdTest commands are documented here: https://github.com/broadinstitute/gatk-sv/blob/master/wdl/RDTest.wdl https://github.com/broadinstitute/gatk-sv/blob/master/wdl/RDTest.wdl
Or, more generally, all of the variant evidence collection steps (including RdTest, PETest, SRTest, and BAFTest) are wrapped into the Module02 workflow here: https://github.com/broadinstitute/gatk-sv/blob/master/wdl/Module02.wdl https://github.com/broadinstitute/gatk-sv/blob/master/wdl/Module02.wdl
And the variant filtering is all executed here: https://github.com/broadinstitute/gatk-sv/blob/master/wdl/Module03.wdl https://github.com/broadinstitute/gatk-sv/blob/master/wdl/Module03.wdl
The command blocks within each of these WDLs will give you example commands to execute each of the steps required for these analyses.
I’d also recommend posting issues under the official GATK-SV repo (https://github.com/broadinstitute/gatk-sv https://github.com/broadinstitute/gatk-sv/tree/master/wdl) instead of here, since this repo is no longer actively maintained.
Thanks, Ryan
On Oct 12, 2020, at 4:45 PM, liud2 notifications@github.com wrote:
Hi Ryan,
I have to admit that I know little about shell and python programming. I use R on daily basis. I am picking up python and shell programing.
In the stvk 0.1 installed on our clusters, I do not see steps for BAF test, rd-test or variant filtering steps. I do see the python scripts in the directories of https://github.com/talkowski-lab/SV-Adjudicator/tree/master and https://github.com/talkowski-lab/svtk/tree/master/svtk. The directory of SV-Adjudicator contains more details than the svtk directory.
After weeks of struggle, I have completed pe-test and sr-test for vcf files generated from delly, lumpy, manta, and the final vcf merged from SVA, LINE1 and ALU. I was stuck with extracting baf information from vcf files. I have the final calibrated GATK indel-snp.vcf (GATK 3.8.1) from ~ 500 human genomes for each chromosome. I am able to extract GTs of the samples from GATK vcf using bcftools query. I do not know the format of the BAF file required for BAF test.
Could you please point me to the python files in the directory of SV-Adjudicator or svtk to complete the 3 steps:
- BAF calculation and BAF-test
- Rd-test
- Variant filtering using random forest
Given that I may not have access to GATK-sv on our clusters soon, I have to rely on our clusters and svtk 0.1 or the python files in the SV-Adjudicator to complete SV analysis of our data by following the svtk data processing chart on your recent Nature paper.
As I have been walking through our SV data processing, I have fully appreciated the amount of time and efforts you team spent in reliably extracting structural variants from ~15,000 human genomes in your Nature paper. We are going to use gnomAD SV database for our reference later.
I appreciate your help.
Delong Liu
Translational Vascular Medicine Branch NHLBI/NIH Building 10, 8N110 10 Center Drive Bethesda, MD 20887 301-451-3410 (w) Liud2@mail.nih.gov
From: Ryan L. Collins notifications@github.com Sent: Saturday, September 26, 2020 8:33 AM To: talkowski-lab/svtk svtk@noreply.github.com Cc: Liu, Delong (NIH/NHLBI) [E] liud2@mail.nih.gov; Author author@noreply.github.com Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi Delong,
Thanks for your interest in running GATK-SV!
GATK-SV was designed as a cloud-based pipeline, so we typically haven’t encouraged deploying GATK-SV on local computing clusters.
That said, it’s not impossible to run on a local cluster, but you will need to have a running instance of Cromwell and the appropriate permissions to use Docker (many institutional clusters don’t allow this).
Regarding sr-test, pe-test, rd-test, and baf-test: no, all four are still included as part of GATK-SV during module 02, so those still need to be run.
Hope this helps,
Ryan
On Sep 25, 2020, at 3:02 PM, liud2 notifications@github.com<mailto:notifications@github.com> wrote:
Hi Ryan,
Thanks for informing me the new repo. Do you know whether gatk-sv can be run on a computing cluster? I have just completed vcfcluster step for the vcf files from Manta, Delly and Lumpy with the svtk.
I took a quick look the gatk-sv data flow chart, it seems that the sr-test, pe-test, rd-test, and baf-test proposed in your Nature paper are excluded from the new gatk-sv chart, is it correct?
Delong Liu
From: Ryan L. Collins notifications@github.com<mailto:notifications@github.com> Sent: Friday, September 25, 2020 9:49 AM To: talkowski-lab/svtk svtk@noreply.github.com<mailto:svtk@noreply.github.com> Cc: Liu, Delong (NIH/NHLBI) [E] liud2@mail.nih.gov<mailto:liud2@mail.nih.gov>; Author author@noreply.github.com<mailto:author@noreply.github.com> Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi,
This repo is no longer being actively maintained. SVTK is now being maintained as part of the following repo: https://github.com/broadinstitute/gatk-sv https://github.com/broadinstitute/gatk-sv
I’d recommend posting your issue there. Sorry for the inconvenience!
On Sep 23, 2020, at 9:34 AM, liud2 notifications@github.com<mailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com>> wrote:
I was able to index the standardized vcf files from Delly using tabix -p vcf, but not able to do so for the vcf files from Manta and Lumpy. The error from tabix points to a coordinate in column 5 of vcf file, which means that the option tabix -p vcf does not work with the *.vcf.gz file from Manta or Lumpy, but Delly. Any suggestions on this issue?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/talkowski-lab/svtk/issues/100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4MDRBKOKW4GPK6UG6MU4DSHH2PTANCNFSM4RXBP3CA.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/talkowski-lab/svtk/issues/100#issuecomment-698939725, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADLYYS7OL4WF7PHH7YHKE5DSHSNV5ANCNFSM4RXBP3CA. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/talkowski-lab/svtk/issues/100#issuecomment-699100782, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4MDRGIUFUONFYSWDDD4PLSHTSKZANCNFSM4RXBP3CA.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/talkowski-lab/svtk/issues/100#issuecomment-699489806, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADLYYS6CG3EC3A7PJIEGW3TSHXNRRANCNFSM4RXBP3CA. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/talkowski-lab/svtk/issues/100#issuecomment-707337173, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4MDRAUT6O4GO2DQ3OC3KLSKNTHTANCNFSM4RXBP3CA.
Hi Ryan,
Thanks for providing the websites. Do you know how much it would cost for allocating 1 GB RAM per minute on firecloud?
Thanks,
Delong Liu
From: Ryan L. Collins notifications@github.com Sent: Monday, October 19, 2020 12:58 PM To: talkowski-lab/svtk svtk@noreply.github.com Cc: Liu, Delong (NIH/NHLBI) [E] liud2@mail.nih.gov; Author author@noreply.github.com Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi Delong,
I’d recommend referring to the workflow files (.wdl) in the official GATK-SV repo for examples of these commands: https://github.com/broadinstitute/gatk-sv/tree/master/wdl https://github.com/broadinstitute/gatk-sv/tree/master/wdl
For example, the BAF calculation is documented here: https://github.com/broadinstitute/gatk-sv/blob/master/wdl/BAFFromGVCFs.wdl https://github.com/broadinstitute/gatk-sv/blob/master/wdl/BAFFromGVCFs.wdl https://github.com/broadinstitute/gatk-sv/blob/master/wdl/BAFFromShardedVCF.wdl https://github.com/broadinstitute/gatk-sv/blob/master/wdl/BAFFromShardedVCF.wdl
And the RdTest commands are documented here: https://github.com/broadinstitute/gatk-sv/blob/master/wdl/RDTest.wdl https://github.com/broadinstitute/gatk-sv/blob/master/wdl/RDTest.wdl
Or, more generally, all of the variant evidence collection steps (including RdTest, PETest, SRTest, and BAFTest) are wrapped into the Module02 workflow here: https://github.com/broadinstitute/gatk-sv/blob/master/wdl/Module02.wdl https://github.com/broadinstitute/gatk-sv/blob/master/wdl/Module02.wdl
And the variant filtering is all executed here: https://github.com/broadinstitute/gatk-sv/blob/master/wdl/Module03.wdl https://github.com/broadinstitute/gatk-sv/blob/master/wdl/Module03.wdl
The command blocks within each of these WDLs will give you example commands to execute each of the steps required for these analyses.
I’d also recommend posting issues under the official GATK-SV repo (https://github.com/broadinstitute/gatk-sv https://github.com/broadinstitute/gatk-sv/tree/master/wdl) instead of here, since this repo is no longer actively maintained.
Thanks, Ryan
On Oct 12, 2020, at 4:45 PM, liud2 notifications@github.com<mailto:notifications@github.com> wrote:
Hi Ryan,
I have to admit that I know little about shell and python programming. I use R on daily basis. I am picking up python and shell programing.
In the stvk 0.1 installed on our clusters, I do not see steps for BAF test, rd-test or variant filtering steps. I do see the python scripts in the directories of https://github.com/talkowski-lab/SV-Adjudicator/tree/master and https://github.com/talkowski-lab/svtk/tree/master/svtk. The directory of SV-Adjudicator contains more details than the svtk directory.
After weeks of struggle, I have completed pe-test and sr-test for vcf files generated from delly, lumpy, manta, and the final vcf merged from SVA, LINE1 and ALU. I was stuck with extracting baf information from vcf files. I have the final calibrated GATK indel-snp.vcf (GATK 3.8.1) from ~ 500 human genomes for each chromosome. I am able to extract GTs of the samples from GATK vcf using bcftools query. I do not know the format of the BAF file required for BAF test.
Could you please point me to the python files in the directory of SV-Adjudicator or svtk to complete the 3 steps:
- BAF calculation and BAF-test
- Rd-test
- Variant filtering using random forest
Given that I may not have access to GATK-sv on our clusters soon, I have to rely on our clusters and svtk 0.1 or the python files in the SV-Adjudicator to complete SV analysis of our data by following the svtk data processing chart on your recent Nature paper.
As I have been walking through our SV data processing, I have fully appreciated the amount of time and efforts you team spent in reliably extracting structural variants from ~15,000 human genomes in your Nature paper. We are going to use gnomAD SV database for our reference later.
I appreciate your help.
Delong Liu
Translational Vascular Medicine Branch NHLBI/NIH Building 10, 8N110 10 Center Drive Bethesda, MD 20887 301-451-3410 (w) Liud2@mail.nih.govmailto:Liud2@mail.nih.gov
From: Ryan L. Collins notifications@github.com<mailto:notifications@github.com> Sent: Saturday, September 26, 2020 8:33 AM To: talkowski-lab/svtk svtk@noreply.github.com<mailto:svtk@noreply.github.com> Cc: Liu, Delong (NIH/NHLBI) [E] liud2@mail.nih.gov<mailto:liud2@mail.nih.gov>; Author author@noreply.github.com<mailto:author@noreply.github.com> Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi Delong,
Thanks for your interest in running GATK-SV!
GATK-SV was designed as a cloud-based pipeline, so we typically haven’t encouraged deploying GATK-SV on local computing clusters.
That said, it’s not impossible to run on a local cluster, but you will need to have a running instance of Cromwell and the appropriate permissions to use Docker (many institutional clusters don’t allow this).
Regarding sr-test, pe-test, rd-test, and baf-test: no, all four are still included as part of GATK-SV during module 02, so those still need to be run.
Hope this helps,
Ryan
On Sep 25, 2020, at 3:02 PM, liud2 notifications@github.com<mailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com>> wrote:
Hi Ryan,
Thanks for informing me the new repo. Do you know whether gatk-sv can be run on a computing cluster? I have just completed vcfcluster step for the vcf files from Manta, Delly and Lumpy with the svtk.
I took a quick look the gatk-sv data flow chart, it seems that the sr-test, pe-test, rd-test, and baf-test proposed in your Nature paper are excluded from the new gatk-sv chart, is it correct?
Delong Liu
From: Ryan L. Collins notifications@github.com<mailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com>> Sent: Friday, September 25, 2020 9:49 AM To: talkowski-lab/svtk svtk@noreply.github.com<mailto:svtk@noreply.github.com<mailto:svtk@noreply.github.com%3cmailto:svtk@noreply.github.com>> Cc: Liu, Delong (NIH/NHLBI) [E] liud2@mail.nih.gov<mailto:liud2@mail.nih.gov<mailto:liud2@mail.nih.gov%3cmailto:liud2@mail.nih.gov>>; Author author@noreply.github.com<mailto:author@noreply.github.com<mailto:author@noreply.github.com%3cmailto:author@noreply.github.com>> Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi,
This repo is no longer being actively maintained. SVTK is now being maintained as part of the following repo: https://github.com/broadinstitute/gatk-sv https://github.com/broadinstitute/gatk-sv
I’d recommend posting your issue there. Sorry for the inconvenience!
On Sep 23, 2020, at 9:34 AM, liud2 notifications@github.com<mailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com<mailto:notifications@github.com%3cmailto:notifications@github.com%3cmailto:notifications@github.com%3cmailto:notifications@github.com>>> wrote:
I was able to index the standardized vcf files from Delly using tabix -p vcf, but not able to do so for the vcf files from Manta and Lumpy. The error from tabix points to a coordinate in column 5 of vcf file, which means that the option tabix -p vcf does not work with the *.vcf.gz file from Manta or Lumpy, but Delly. Any suggestions on this issue?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/talkowski-lab/svtk/issues/100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4MDRBKOKW4GPK6UG6MU4DSHH2PTANCNFSM4RXBP3CA.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/talkowski-lab/svtk/issues/100#issuecomment-698939725, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADLYYS7OL4WF7PHH7YHKE5DSHSNV5ANCNFSM4RXBP3CA. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/talkowski-lab/svtk/issues/100#issuecomment-699100782, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4MDRGIUFUONFYSWDDD4PLSHTSKZANCNFSM4RXBP3CA.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/talkowski-lab/svtk/issues/100#issuecomment-699489806, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADLYYS6CG3EC3A7PJIEGW3TSHXNRRANCNFSM4RXBP3CA. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/talkowski-lab/svtk/issues/100#issuecomment-707337173, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4MDRAUT6O4GO2DQ3OC3KLSKNTHTANCNFSM4RXBP3CA.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/talkowski-lab/svtk/issues/100#issuecomment-712299495, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADLYYS344RO5B6H4XLIG6IDSLRVZNANCNFSM4RXBP3CA.
I was able to index the standardized vcf files from Delly using tabix -p vcf, but not able to do so for the vcf files from Manta and Lumpy. The error from tabix points to a coordinate in column 5 of vcf file, which means that the option tabix -p vcf does not work with the *.vcf.gz file from Manta or Lumpy, but Delly. Any suggestions on this issue?