thierrygosselin / radiator

RADseq Data Exploration, Manipulation and Visualization using R
https://thierrygosselin.github.io/radiator/
GNU General Public License v3.0
58 stars 23 forks source link

The GDS node "$ref" does not exist. #173

Closed jcaccavo closed 1 year ago

jcaccavo commented 1 year ago

Hi Thierry and others with this issue,

I'm re-posting my comment from #168 as it is a closed thread and perhaps it's better to re-open a new thread to address the issue.

I too have this same issue (an error referring to a "The GDS node "$ref" does not exist.") when trying to run radiator::filter_rad. For me, it's with a .vcf file produced in Stacks as the input.

You can download the .vcf file from my dropbox, as well as the strata file.

I have radiator version 1.2.5.

I tried just reading the vcf file (radiator::read_vcf), and also got the same error.

Below are the commands I used and the radiator output (including this error).

Thanks in advance for your help! :)

data <- radiator::filter_rad(data = "3_subarea_p3_p1r0.6_populations.snps.vcf", strata = "strata_subarea.tsv", output = "tidy", interactive.filter = TRUE, verbose = TRUE, parallel.core = parallel::detectCores() - 1)

################################################################################
############################# radiator::filter_rad #############################
################################################################################
# Execution date@time: 20230127@1156
# Folder created: filter_rad_20230127@1156
# Function call and arguments stored in: radiator_filter_rad_args_20230127@1156.tsv
# File written: random.seed (452882)                                  
# Filters parameters file generated: filters_parameters_20230127@1156.tsv
# Warning in SeqArray::seqVCF_Header(vcf.fn = vcf) :                  
#   There are too many lines in the header (>= 10000). In order not to slow down the conversion, please consider deleting unnecessary annotations (like contig).
# Warning in SeqArray::seqVCF_Header(vcf.fn = vcf) :
#   There are too many lines in the header (>= 20000). In order not to slow down the conversion, please consider deleting unnecessary annotations (like contig).
# Warning in SeqArray::seqVCF_Header(vcf.fn = vcf) :
#   There are too many lines in the header (>= 30000). In order not to slow down the conversion, please consider deleting unnecessary annotations (like contig).
# Warning in SeqArray::seqVCF_Header(vcf.fn = vcf) :
#   There are too many lines in the header (>= 40000). In order not to slow down the conversion, please consider deleting unnecessary annotations (like contig).
# ✔ Reading VCF [6m 32.2s]
# Error in SeqArray::seqGetData(gdsfile = data, var.name = "$ref") : 
#   The GDS node "$ref" does not exist.
# 
# Computation time, overall: 392 sec
############################# completed filter_rad #############################

test1 <- radiator::read_vcf("3_subarea_p3_p1r0.6_populations.snps.vcf")

################################################################################
############################## radiator::read_vcf ##############################
################################################################################
# Execution date@time: 20230127@1902
# Folder created: read_vcf_20230127@1902
# Function call and arguments stored in: radiator_read_vcf_args_20230127@1902.tsv
# File written: random.seed (679284)                                  
# Warning in SeqArray::seqVCF_Header(vcf.fn = vcf) :
#   There are too many lines in the header (>= 10000). In order not to slow down the conversion, please consider deleting unnecessary annotations (like contig).
# Warning in SeqArray::seqVCF_Header(vcf.fn = vcf) :
#   There are too many lines in the header (>= 20000). In order not to slow down the conversion, please consider deleting unnecessary annotations (like contig).
# Warning in SeqArray::seqVCF_Header(vcf.fn = vcf) :
#   There are too many lines in the header (>= 30000). In order not to slow down the conversion, please consider deleting unnecessary annotations (like contig).
# Warning in SeqArray::seqVCF_Header(vcf.fn = vcf) :
#   There are too many lines in the header (>= 40000). In order not to slow down the conversion, please consider deleting unnecessary annotations (like contig).
# ✔ Reading VCF [6m 50.1s]
# Analyzing VCF
# VCF source: Stacks v2.61
# Error in SeqArray::seqGetData(gdsfile = data, var.name = "$ref") : 
#   The GDS node "$ref" does not exist.
# 
# Computation time, overall: 410 sec
# ############################## completed read_vcf ##############################
thierrygosselin commented 1 year ago

If you remove the lines in the VCF that contains ##contig it should take seconds instead of minutes to read the VCF. On my computer: 11 sec not 6 min...

thierrygosselin commented 1 year ago

So this should work, with or without using the strata argument.

test1 <- radiator::read_vcf(data = "3_subarea_p3_p1r0.6_populations.snps.vcf", strata = "strata_subarea.tsv")

confirm it work with the new radiator version (1.2.6)

thierrygosselin commented 1 year ago

using 48.1, 48.2, 48.4 to name your pops / strata, might generate a lot of problems in my package and others as well...

thierrygosselin commented 1 year ago

re-open the issue if you're still having problem