zhengxwen / SeqArray

Data management of large-scale whole-genome sequence variant calls (Development version only)
http://www.bioconductor.org/packages/SeqArray
43 stars 12 forks source link

seqVCF2GDS error in seqVCF_Header #35

Closed ShawnCone closed 5 years ago

ShawnCone commented 6 years ago

Hi, this is actually the first time I encountered this issue after using SeqArray for few weeks without error. Here is the log:

> library(SeqArray)
> library(SNPRelate)
> library(GenomicRanges)
> sample = "/Users/daveistanto/Documents/work-related/RStuff/Sample_Sources/truptiWithHeader/SRX1149932TruptifilteredWithHeader.vcf"
> seqVCF2GDS(sample, "/Users/daveistanto/Documents/work-related/RStuff/pca_dumps/trupti.gds")
Thu Sep  6 17:53:19 2018
Error in seqVCF_Header(vcf.fn) : 
  FORMAT=<ID=PL,Number=3,Type=Integers,Description="Genotype likelihood">

It appears that the FORMAT line in the vcf is causing this problem(?). I have never encountered this problem before and is wondering if there's a solution to this.

this was run in this environment:

> R.version
               _                           
platform       x86_64-apple-darwin15.6.0   
arch           x86_64                      
os             darwin15.6.0                
system         x86_64, darwin15.6.0        
status                                     
major          3                           
minor          5.1                         
year           2018                        
month          07                          
day            02                          
svn rev        74947                       
language       R                           
version.string R version 3.5.1 (2018-07-02)
nickname       Feather Spray       

Thank you very much

zhengxwen commented 6 years ago

Could you please try R_3.4.3? It might be the same issue as #32.

ShawnCone commented 6 years ago

Hi, I just tried running it using R_3.4.3, and it outputs similar error:

> seqVCF2GDS(sample, "/Users/daveistanto/Documents/work-related/RStuff/pca_dumps/trupti.gds")
Fri Sep  7 11:32:40 2018
Error in seqVCF_Header(vcf.fn) : 
  FORMAT=<ID=PL,Number=3,Type=Integers,Description="Genotype likelihood">
> R.Version()
$platform
[1] "x86_64-apple-darwin15.6.0"

$arch
[1] "x86_64"

$os
[1] "darwin15.6.0"

$system
[1] "x86_64, darwin15.6.0"

$status
[1] ""

$major
[1] "3"

$minor
[1] "4.3"

$year
[1] "2017"

$month
[1] "11"

$day
[1] "30"

$`svn rev`
[1] "73796"

$language
[1] "R"

$version.string
[1] "R version 3.4.3 (2017-11-30)"

$nickname
[1] "Kite-Eating Tree"

Thank you

zhengxwen commented 6 years ago
FORMAT=<ID=PL,Number=3,Type=Integers,Description="Genotype likelihood">

It seems that Integers should be Integer.

ShawnCone commented 6 years ago

I changed it and still did not work. What I did was delete the information relating other things aside from the GT, since everything else is redundant at this point. However, if this is resolved I think it would be better. Thank you very much.

zhengxwen commented 5 years ago

You can use h <- seqVCF_Header(), modify h and seqVCF2GDS(, header=h). so you don't need to modify the original VCF files.