rcastelo / VariantFiltering

Filter genetic variants using different criteria such as inheritance model, conservation, etc.
4 stars 7 forks source link

Issue with Different Builds? #25

Open danphillips28 opened 11 months ago

danphillips28 commented 11 months ago

Hi! I am exploring variant annotation for the first time and was hoping to use this tool, however I've run into problems.

I have a trio-VCF file of parents & offspring with a genetic condition. The VCF tells me the build used was GRCh38. Note the VCF file was given to me and has already been heavily filtered down to about ~500 variants.

I installed VariantFiltering and tried to follow the tutorial. I wasn't that surprised to get an error.

vfpar <- VariantFilteringParam("~/Documents/Imperial.Job.Test/clinical_genetics_RA_test/PHNED_joint_hg38.vcf")
allvars <- unrelatedIndividuals(vfpar)

Chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y have different lengths between the input VCF and the BSgenome package. These chromosomes will be discarded from further analysis Assuming the genome build of the input variants is hs37d5. Switching to the UCSC chromosome-name style from the transcript-centric annotation package. Chromosome chrM has different lengths between the input VCF and the input TxDb pakage. This chromosome will be discarded from further analysis Error in .matchSeqinfo(variants, txdb, bsgenome) : None of the chromosomes in the input VCF file has the same length as the chromosomes in the input TxDb package. The genome reference sequence employed to generate the VCF file was probably different from the one in the input TxDb package.

I tried again specifying hg38 as the Txdb file but still got an error, which has basically stopped me and I'll have to switch tools because I'm working towards a deadline. However I'd really prefer to get this tool off-the-ground for my future analyses. Any pointers?

vfpar <- VariantFilteringParam("~/Documents/Imperial.Job.Test/clinical_genetics_RA_test/PHNED_joint_hg38.vcf",
                               txdb = "TxDb.Hsapiens.UCSC.hg38.knownGene")
allvars <- unrelatedIndividuals(vfpar)

Chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y have different lengths between the input VCF and the BSgenome package. These chromosomes will be discarded from further analysis Assuming the genome build of the input variants is hs37d5. Switching to the UCSC chromosome-name style from the transcript-centric annotation package. Assumming hs37d5 and hg38 represent the same genome build. Discarding scaffold sequences. 458 variants processed Error in validObject(.Object) : invalid class "VariantFilteringResults" object: 1: invalid object for slot "cutoffs" in class "VariantFilteringResults": got class "list", should be or extend class "CutoffsList" invalid class "VariantFilteringResults" object: 2: invalid object for slot "sortings" in class "VariantFilteringResults": got class "NULL", should be or extend class "CutoffsList" In addition: Warning message: In .local(param, ...) : The input VCF file has no variants.

Thanks!

rcastelo commented 11 months ago

Hi, if the build of the VCF is GRCh38, you should first install the following annotation packages for that genome build:

options(timeout=1200) ## some of the annotation packages are > 1Gb and may give timeouts > 300s
BiocManager::install(c("BSgenome.Hsapiens.UCSC.GRCh38", "TxDb.Hsapiens.UCSC.hg38.knownGene",
                       "TxDb.Hsapiens.UCSC.hg38.knownGene", "SNPlocs.Hsapiens.dbSNP155.GRCh38",
                       "MafH5.gnomAD.v3.1.2.GRCh38", "phastCons100way.UCSC.hg38"))

and then try building the parameter object in the following way:

vfpar <- VariantFilteringParam("~/Documents/Imperial.Job.Test/clinical_genetics_RA_test/PHNED_joint_hg38.vcf",
                               bsgenome="BSgenome.Hsapiens.UCSC.GRCh38",
                               txdb="TxDb.Hsapiens.UCSC.hg38.knownGene",
                               snpdb="SNPlocs.Hsapiens.dbSNP155.GRCh38",
                               otherAnnotations=c("MafH5.gnomAD.v3.1.2.GRCh38",
                                                  "phastCons100way.UCSC.hg38"))

If your genetic data is from a trio, you want probably to provide the parent-offspring and phenotype information through a PED file, this can be done with the parameter pedFilename (see VariantFilteringParam help page for full details on this). If you still get an error, please provide the output of the function traceback() called right after the error and the output of the function sessionInfo().

Thanks for your patience,

robert.