skandlab / SMuRF

MIT License
21 stars 7 forks source link

error at "extracting meta data from VRanges" #35

Closed gianfilippo closed 4 years ago

gianfilippo commented 5 years ago

Hi,

I am testing SMuRF on a set of files I generated running the individual callers. I am getting an "Error in normalizeDoubleBracketSubscript". It seems that the expected data type is not there. Are there specific requirements for the input vcfs ?

Thanks Gianfilippo

Below is my command line and the output myresults = smurf(directory = "Variants_hg38_BWA_ensemble/Sample_G1700T_012",mode="combined",nthreads=20,output.dir="Variants_hg38_BWA_ensemble/Sample_G1700T_012",build="hg38",check.packages=T) [1] "SMuRFv1.6 (3rd Oct 2019)" [1] "Saving output files to: Variants_hg38_BWA_ensemble/Sample_G1700T_012" Connection successful!

R is connected to the H2O cluster: H2O cluster version: 3.26.0.2 H2O cluster version age: 2 months and 25 days
H2O cluster total nodes: 1 H2O cluster total memory: 26.63 GB H2O cluster total cores: 20 H2O cluster allowed cores: 1 H2O cluster healthy: TRUE H2O API Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4 R Version: R version 3.5.0 (2018-04-23)

Accessing files: Variants_hg38_BWA_ensemble/Sample_G1700T_012/mutect2.vcf.gz Variants_hg38_BWA_ensemble/Sample_G1700T_012/freebayes.vcf.gz Variants_hg38_BWA_ensemble/Sample_G1700T_012/varscan.vcf.gz Variants_hg38_BWA_ensemble/Sample_G1700T_012/vardict.vcf.gz [1] "Parsing step" [1] "reading vcfs" [1] "reading mutect2" [1] "reading freebayes" [1] "reading varscan" [1] "reading vardict" Time difference of 16.48991 secs [1] "extracting calls passed by at least 1 caller" Time difference of 0.82076 secs [1] "extracting meta data from VRanges" Error in normalizeDoubleBracketSubscript(i, x, exact = exact, allow.NA = TRUE, : invalid [[ subscript type: NULL

tyler5huang commented 5 years ago

Hi Gianfilippo,

I have recently encountered this error because the tumour sample name and normal sample name could be identified correctly.

I have resolved this identifying the T and N sample names from the vcf files. Do you happen to know your sample names? They should be your bam file names

Regards, Weitai

On 22 Oct 2019, at 1:05 PM, Gianfilippo Coppola notifications@github.com wrote:



Hi,

I am testing SMuRF on a set of files I generated running the individual callers. I am getting an "Error in normalizeDoubleBracketSubscript". It seems that the expected data type is not there. Are there specific requirements for the input vcfs ?

Thanks Gianfilippo

Below is my command line and the output myresults = smurf(directory = "Variants_hg38_BWA_ensemble/Sample_G1700T_012",mode="combined",nthreads=20,output.dir="Variants_hg38_BWA_ensemble/Sample_G1700T_012",build="hg38",check.packages=T) [1] "SMuRFv1.6 (3rd Oct 2019)" [1] "Saving output files to: Variants_hg38_BWA_ensemble/Sample_G1700T_012" Connection successful!

R is connected to the H2O cluster: H2O cluster version: 3.26.0.2 H2O cluster version age: 2 months and 25 days H2O cluster total nodes: 1 H2O cluster total memory: 26.63 GB H2O cluster total cores: 20 H2O cluster allowed cores: 1 H2O cluster healthy: TRUE H2O API Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4 R Version: R version 3.5.0 (2018-04-23)

Accessing files: Variants_hg38_BWA_ensemble/Sample_G1700T_012/mutect2.vcf.gz Variants_hg38_BWA_ensemble/Sample_G1700T_012/freebayes.vcf.gz Variants_hg38_BWA_ensemble/Sample_G1700T_012/varscan.vcf.gz Variants_hg38_BWA_ensemble/Sample_G1700T_012/vardict.vcf.gz [1] "Parsing step" [1] "reading vcfs" [1] "reading mutect2" [1] "reading freebayes" [1] "reading varscan" [1] "reading vardict" Time difference of 16.48991 secs [1] "extracting calls passed by at least 1 caller" Time difference of 0.82076 secs [1] "extracting meta data from VRanges" Error in normalizeDoubleBracketSubscript(i, x, exact = exact, allow.NA = TRUE, : invalid [[ subscript type: NULL

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/skandlab/SMuRF/issues/35?email_source=notifications&email_token=AENDD5XFY47RP2MIIIWKQWLQP2CXXA5CNFSM4JDKTG5KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HTMSERA, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AENDD5SWPAMY6DYV75X4DCDQP2CXXANCNFSM4JDKTG5A.

This e-mail and any attachments are only for the use of the intended recipient and may contain material that is confidential, privileged and/or protected by the Official Secrets Act. If you are not the intended recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person.

gianfilippo commented 5 years ago

Hi,

thanks.

I can see the sample names (tumor and normal) in each of the 4 files (I have Mutect2, VarSvan2, VarDict, freebayes). They all are from the same sample, but I can see each has a different name. I guess I have to fix that. Also, VarScan used the whole file path as sample names. And in my freebayes vcf I can see an extra column that you do not have in your sample freebayes vcf. How do I get rid of it ?

Thanks

gianfilippo commented 5 years ago

Hi,

I just edited the freebayes vcf and made sure all samples names in the various vcfs are consistent (see below). I am still getting the exact same error.

Do you have any other thought ?

Thanks

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample_G1700T_012 Sample_G1700N_006

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample_G1700N_006 Sample_G1700T_012

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample_G1700T_012 Sample_G1700N_006

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample_G1700N_006 Sample_G1700T_012

tyler5huang commented 5 years ago

Resolving error: Error in normalizeDoubleBracketSubscript(i, x, exact = exact, allow.NA = TRUE, : invalid [[ subscript type: NULL

Cause: vcf sample names for tumour and normal files not detected automatically.

Solution: Manually state your tumor file tag. Example: t.label='-T t.label='_tumor' t.label='T001' t.label='T' #also works for you

Error message: 't.label for tumor sample is not unique, duplicated or missing'

myresults = smurf(directory = "Variants_hg38_BWA_ensemble/Sample_G1700T_012",
mode="combined",
t.label='T_012',
nthreads=20,
output.dir="Variants_hg38_BWA_ensemble/Sample_G1700T_012",
build="hg38",
check.packages=T)

Please download the latest patch SMuRF-v1.6.2. Thanks!

gianfilippo commented 5 years ago

thanks!! I will try this