Error in `dplyr::mutate()` - No Variants Selected

silknets commented 1 year ago

Hello All,

I'm a new Radiator user, and I imagine my error has a simple resolution attributable to the user. However, I've been unable to resolve this by reviewing closed issues or digging into this GitHub repo. I'm adding this blank issue in the hopes that I can have it resolved, or to better understand why the error was thrown!

For some context, I'm running filter_rad() on my strictest VCF output from a Stacks:populations run (only 199 loci for 188 samples). I also selected pretty generic filters for the filter_rad(), just to start making sense of the outputs. However, when the run has completed, I noted the error below. As a new user, it's unclear if this error has prevented the generation of additional plots / figures that would help me to assess the sequence data + underlying biology going on. I'm including four files for additional info if needed per contributing guidelines (devtools::session_info, full error text, text of full session for info on filters selected, and a zipped folder with the VCF input).

Thanks from a grateful user - Sam S.

###################### radiator::detect_duplicate_genomes ###################### ################################################################################ Execution date@time: 20230315@1148 Function call and arguments stored in a file File written: radiator_detect_duplicate_genomes_args_20230315@1148.tsv File written: random.seed (314710)
Error in dplyr::mutate(): ℹ In argument: MISSING_PROP = round(...). Caused by error in seqParallel(): ! No variants selected. Run rlang::last_error() to see where the error occurred. Warning messages: 1: There was 1 warning in dplyr::mutate(). ℹ In argument: WHITELISTED_MARKERS = purrr::map_int(...). Caused by warning: ! Using one column matrices in filter() was deprecated in dplyr 1.1.0. ℹ Please use one dimensional logical vectors instead. ℹ The deprecated feature was likely used in the dplyr package. Please report the issue at https://github.com/tidyverse/dplyr/issues. This warning is displayed once every 8 hours. Call lifecycle::last_lifecycle_warnings() to see where this warning was generated. 2: Removed 144 rows containing missing values (geom_point()). 3: Removed 130 rows containing missing values (geom_point()).

session_info.txt full_error.txt session.txt populations.snps.zip

thierrygosselin commented 1 year ago

you just ran out of SNPs for this step to work... Check this part : radiator::filter_snp_position_read in the lines inside this file: session.txt

thierrygosselin commented 1 year ago

If you have already filtered your data elsewhere (stacks?) why put it in filter_rad ?

Here is the checks I usually do when I receive a dataset:

data <- radiator::read_vcf(data = "populations.snps.vcf")

VCF summary
Missing data: 
    markers: 0.17
    individuals: 0.17

Coverage info:
    individuals mean total coverage: 361750
    individuals mean genotype coverage: 95
    markers mean coverage: 100

VCF info:
Number of chromosome/contig/scaffold: 1
Number of locus: 199
Number of markers: 4445
Number of strata: 1
Number of individuals: 188

Number of ind/strata:
1pop = 188

I didn't have your strata file, but really no need to do these checks ...

**I see several problems:***

most of your SNP's are between 100 and 300 pb (the position on the read)
Way way too many SNPs/locus.... what's your specie ?
The range of missing genotypes for your samples is very high, checks the filters you used before in stacks, it's not worth doing filtering with bad samples... it drags everything down in terms of marker discovery.

thierrygosselin commented 1 year ago

Duplicate check, that's the part that didn't go well in filter_rad because you had no markers left. The same analysis done independently on your VCF:

dup <- radiator::detect_duplicate_genomes(data = data)

Would really like to know what you're working on, but when I see the graph I'm seeing it's usually with very very close samples (close kin, families, etc) and technical and / or lab duplicates.

thierrygosselin commented 1 year ago

This last check look for wet lab trouble, mix samples, etc

mix <- radiator::detect_mixed_genomes(data = data)

These samples (sb0302, sb0405, sb0417) are not like the rest, they are definitely outliers. When I see this it's usually another species or something went wrong in the wet lab...

thierrygosselin commented 1 year ago

Hope this help, re-open an issue if you have another problem

thierrygosselin / radiator

Error in `dplyr::mutate()` - No Variants Selected #175