Closed drmaly closed 6 years ago
Hi @drmaly,
The function .Call
is R's interface to compiled code. Here it's calling the vcfR function that is written in C++ that is called gt_to_popsum
. The error reports a problem with stoi, which is a part of C++ and I would guess to be robust. But it is possible my code is not using it correctly. Is it possible that there is something out of place with your data? Could you share a portion of your data that would allow me to reproduce the issue?
Thank you for reporting this! Brian
Thank you Brian @knausb for the reply. Please find attached sample Vcf file that reproduced the error in my system.
Waleed
Hi Waleed @drmaly, thank you for sharing your file! I now see the following.
> vcf <- read.vcfR('sampleVcf.vcf.gz')
Scanning file to determine attributes.
File attributes:
meta lines: 326
header_line: 327
variant count: 4674
column count: 16
Meta line 326 read in.
All meta lines processed.
gt matrix initialized.
Character matrix gt created.
Character matrix gt rows: 4674
Character matrix gt cols: 16
skip: 0
nrows: 4674
row_num: 0
Processed variant: 4674
All variants processed
> vcf
***** Object of Class vcfR *****
7 samples
1 CHROMs
4,674 variants
Object size: 9.2 Mb
0 percent missing data
***** ***** *****
> chrom <- create.chromR(vcf)
Names in vcf:
1
Initializing var.info slot.
var.info slot initialized.
> chrom <- proc.chromR(vcf)
Error: class(x) == "chromR" is not TRUE
> chrom <- proc.chromR(chrom)
Error in .gt_to_popsum(var_info = var.info, gt = gt) : stoi
In addition: Warning messages:
1: In proc.chromR(chrom) : seq slot is NULL.
2: In proc.chromR(chrom) : annotation slot has no rows.
3: In proc.chromR(chrom) :
seq slot is NULL, chromosome representation not made (seq2rects).
4: In proc.chromR(chrom) :
seq slot is NULL, chromosome representation not made (seq2rects, chars=n).
Timing stopped at: 0.024 0 0.028
This appears to reproduce the behaviour you reported and is the critical first step in troubleshooting. Thank you for your help here!
You had asked whether this was a bug or whether there was an issue with your file. I'm going to guess that your file is fine and the issue is the code. The VCF specification is flexible meaning that different software creates VCF files that include different sorts of data. And every once in a while someone provides me with an example of something I had never anticipated. I suspect that is what has happened here. I'll need some time to work on it. But we'll figure it out, thanks!
This data contains genotypes where one allele is called but the other is missing.
> grep("1/.", vcf@gt[,2], fixed = TRUE)
[1] 47 324 355 621 760 1107 1172 1276 1297 1776 1791 1933 1947 2162 2815 3030 3397 3408 3676 3800 4242 4299
[23] 4336 4367 4400 4450 4489
> vcf@gt[47,2]
QMD_10_01
"1/.:0,17,37:54:99:1542,1082,1031"
> chrom <- create.chromR(vcf[1:46,1:2], verbose = FALSE)
> chrom <- proc.chromR(chrom, verbose = FALSE)
# Works
> chrom <- create.chromR(vcf[1:47,1:2], verbose = FALSE)
> chrom <- proc.chromR(chrom, verbose = FALSE)
# Generates stoi error
This sort of genotype is something I've suspected could be handled in the VCF specification, but is not something I've ever observed. I've only observed genotypes where all alleles are either called or all missing. So I'm not handling this correctly and when I try to call stoi (convert string to integer) on a dot we get an error.
Thank you very much Brian for your help It is my data that is causing the issue. To be honest, I also did not know there are such genotypes in my data. In my judgement, such genotypes are not useful for further analysis so we have to remove it. Best regards,
Waleed
Hi @drmaly, I am going to interpret this differently from you. I think these are valid genotypes. I'm not sure how you might analyse them. But my policy is that if the data appears to adhere to the VCF specification then vcfR should be able to handle it. And if GATK is producing them then they must have something in place to work with them. And it was a reasonably easy fix. The function .gt_to_popsum
now ignores missing data instead of trying to convert them to an integer. The change is on GitHub now and can be installed as follows.
devtools::install_github(repo="knausb/vcfR")
Note that you will need a compiler to install the GitHub version. This will be in vcfR v1.9.0 when it is released to CRAN. However, we recently released a new version and the CRAN people ask us to not submit too frequently so that we do not overwhelm their volunteer staff. So it will be a couple months before it hits CRAN.
Thanks! Brian
Thank you Brian. I think your interpretation is better than mine and it is advisable to keep those genotype than removing them. I have tested the new version and it working now. Best, Waleed
@drmaly, you're the only one I know to have reported using this sort of genotype with vcfR. So please let me know if you encounter issues elsewhere. Thank you! Brian
Sure! I will.
Hi, I have got the following error while running proc.chromR function
I investigated this issue and It seems the cause of this error is when the following function is called, within gt.to.popsum(x):
I am not sure if this is something related to my data or it is a bug?!