thierrygosselin / radiator

RADseq Data Exploration, Manipulation and Visualization using R
https://thierrygosselin.github.io/radiator/
GNU General Public License v3.0
58 stars 23 forks source link

Error: Can't recycle `ID_SEQ` (size 102368) to match `length` (size 224). #147

Closed mgavery closed 2 years ago

mgavery commented 2 years ago

I'm getting an error trying to read in a vcf file (ultimately trying to convert to colony). I'm looking for help regarding troubleshooting this error. Original files and session_info attached

tidy.data <- radiator::tidy_vcf(

  • data = "no_miss_high_maf.vcf",
  • strata = "Pinto_strata.txt")

Reading VCF...

Data summary: number of samples: 457 number of markers: 224 0s
Filter monomorphic markers Number of individuals / strata / chrom / locus / SNP: Blacklisted: 0 / 0 / 0 / 0 / 0

Filter common markers: Number of individuals / strata / chrom / locus / SNP: Blacklisted: 0 / 0 / 0 / 0 / 0

Generating individual stats... Error: Can't recycle ID_SEQ (size 102368) to match length (size 224). Run rlang::last_error() to see where the error occurred. In addition: Warning message: Outer names are only allowed for unnamed scalar atomic inputs

Computation time, overall: 2 sec radiator_error.zip

thierrygosselin commented 2 years ago

Dear Mackenzie, sorry about the issue, probably something wrong with my latest push today. Will check ASAP

thierrygosselin commented 2 years ago

Works on my end. Try re-starting R or RStudio depending on what you are using.

Try these:

test1 <- radiator::read_vcf(data = "no_miss_high_maf.vcf", strata = "Pinto_strata.txt")

if it works, try this

test2 <- radiator::tidy_vcf(data = "no_miss_high_maf.vcf", strata = "Pinto_strata.txt")

To get a glimpse of what you should expect with colony you could look at the figure generated with this:

dup <- radiator::detect_duplicate_genomes(data = test1)

Without the need to filter individuals.

You will see in manhattan.plot.distance.png that you have a duplicated samples (or technical replicate) and lots of close kin!

thierrygosselin commented 2 years ago

If it doesn't work, it could be an issue with the oldest version of R, will have to dig further.

mgavery commented 2 years ago

I'm still getting the same error after restarting Rstudio, when running the following. Perhaps updating R on my end? I can try that. ..and thank you for your help!! test1 <- radiator::read_vcf(data = "no_miss_high_maf.vcf", strata = "Pinto_strata.txt")

output: Execution date@time: 20211222@1340 Folder created: read_vcf_20211222@1340 Function call and arguments stored in: radiator_read_vcf_args_20211222@1340.tsv File written: random.seed (685960)

Reading VCF...

Data summary: number of samples: 457 number of markers: 224

Read time: 0 sec

GDS file written: radiator_20211222@1340.gds

Analyzing the vcf... VCF source: Stacks v2.55 Data is bi-allelic Cleaning VCF's sample names File written: cleaned.vcf.id.info_20211222@1340.tsv
Synchronizing data and strata... 0s Number of strata: 8 Number of individuals: 457 Reads assembly: reference-assisted Filters parameters file generated: filters_parameters_20211222@1340.tsv

Filter monomorphic markers Number of individuals / strata / chrom / locus / SNP: Blacklisted: 0 / 0 / 0 / 0 / 0

Filter common markers: Number of individuals / strata / chrom / locus / SNP: Blacklisted: 0 / 0 / 0 / 0 / 0

Preparing output files... File written: whitelist.markers.tsv
File written: strata.filtered.tsv

Generating individual stats... Error: Can't recycle ID_SEQ (size 102368) to match length (size 224). Run rlang::last_error() to see where the error occurred. In addition: Warning message: Outer names are only allowed for unnamed scalar atomic inputs

Computation time, overall: 5 sec

thierrygosselin commented 2 years ago

Try updating R, it's not a bad thing considering it's below version 4. let me know if the problem is resolved after

thierrygosselin commented 2 years ago

re-open the issue if you still have a problem after upgrading to R v.4

mgavery commented 2 years ago

Upgrading did the trick. Thanks!