Closed Imogen-D closed 1 year ago
Update:
Running on a smaller subset of data in an interactive session gives a clearer error that appears to be with the stacks input. Help would be much appreciated.
`################################################################################
######################### radiator::genomic_converter ##########################
################################################################################
Execution date@time: 20220623@1130
Folder created: 12_radiator_genomic_converter_20220623@1130
Function call and arguments stored in: radiator_genomic_converter_args_20220623@1130.tsv
Filters parameters file generated: filters_parameters_20220623@1130.tsv
Importing data: vcf.file
Reading VCF...
Data summary:
number of samples: 17
number of markers: 4
Filter monomorphic markers
Number of individuals / strata / chrom / locus / SNP:
Blacklisted: 0 / 0 / 0 / 1 / 1
Strata with low sample size detected: fig <- FALSE
Filter common markers:
Number of individuals / strata / chrom / locus / SNP:
Blacklisted: 0 / 0 / 0 / 2 / 2
Generating individual stats...
Generating markers stats...
[==================================================] 100%, completed, 0s
[==================================================] 100%, completed, 0s
Number of chromosome/contig/scaffold: 1
Number of locus: 1
Number of markers: 1
Number of strata: 3
Number of individuals: 17
Number of ind/strata:
Y = 15
S = 1
D = 1
Number of duplicate id: 0
radiator Genomic Data Structure (GDS) file: radiator_20220623@1130.gds
Genotypes formats generated with 1 SNPs:
GT_BIN (the dosage of ALT allele: 0, 1, 2 NA): TRUE
GT_VCF (the genotype coding VCFs: 0/0, 0/1, 1/1, ./.): TRUE
GT_VCF_NUC (the genotype coding in VCFs, but with nucleotides: A/C, ./.): TRUE
GT (the genotype coding 'a la genepop': 001002, 001001, 000000): TRUE
LOCUS field empty... adding unique id instead
HQ values are all missing: removing column
Tidy data file written: radiator_20220623@1130.rad
Computation time, overall: 6 sec
Writing tidy data set:
practising.rad
Preparing data for output
Data is bi-allelic
Generating rubias output
Error in data.table::melt.data.table(data = ., id.vars = cols, measure.vars = measure_vars, :
One or more values in 'id.vars' is invalid.
Calls: genomic_converter ... %<>% -> <Anonymous> -> %>% -> <Anonymous> -> <Anonymous>
In addition: Warning messages:
1: Unknown or uninitialised column: `LOCUS`.
2: Unknown or uninitialised column: `VARIANT_ID`.
Computation time, overall: 8 sec
######################### completed genomic_converter ##########################
Execution halted`
Sorry for the long delay Do you mind sharing the vsf so that I can fix the bug ? I'll reopen the issue when I have something to work with.
I'm getting the same error trying to convert to rubias; my vcf (generated in Stacks) and strata files are attached! vcf_autosomal_genstr_filtered_maxMeanDP49_rmRelated_snps.vcf.zip sampleIDs_strata_rmRelated_maxMeanDP49.txt
I've tried the following recommendations in the question: https://github.com/thierrygosselin/radiator/issues/177 (substituting rubias format). The errors I get depending on the conversion code are:
Error in data.table::melt.data.table(data = ., id.vars = cols, measure.vars = measure_vars, : One or more values in 'id.vars' is invalid.
Error in if (length(markers) != ncol(data)) { : argument is of length zero
I realized my strata file wasn't formatted correctly for rubias, so the revised file is attached. This doesn't change the error and I tried the conversion without a strata file (to no avail), but thought the correct file would be helpful if the conversion gets up and running. Thanks! rubias_strata_file.txt
Hi,
Apologies if I miss anything out of this issue you need - I am still a relative newbee to coding!
I am trying to convert a vcf file to a rubias file. I have produced two vcfs from stacks, using the same population map, which have then been concatenated (with vcf tools).
The read vcf appears to run correctly, however write rubias throws an error. At the moment I am giving it a strata file for read vcf which has INDIVIDUALS and STRATA (stratain) as well as a different strata file for write rubias which has SAMPLE_TYPE, REPUNIT, COLLECTION, INDIVIDUALS (stratarub).
My log output file, script and strata files are attached. However, the vcf is very large so I am unsure how to attach it and takes a long time to subset, let me know what I can do about this.
Main Commands (with parallel core required as per some previous issues, which fixed an initial issue) - the rest in script. ` gdsframe <- read_vcf(vcf, strata=stratain, filename="rubias90filt", parallel.core=1L)
write_rubias(gdsframe, strata=stratarub, filename="rubias90filt", parallel.core=1L)`
Error below:
devtools::sessioninfo()
Thank you so much!