thierrygosselin / radiator

RADseq Data Exploration, Manipulation and Visualization using R
https://thierrygosselin.github.io/radiator/
GNU General Public License v3.0
58 stars 23 forks source link

error Sexy_markers operator is invalid for atomic vectors #183

Closed kasmiyassin closed 6 months ago

kasmiyassin commented 6 months ago

I have a VCF file generate from Stacks2, and I prepared my STRAT fille as described but I recieve all time this error:

Error in SeqArray::seqGetData(data, "annotation/format/DP")$data : $ operator is invalid for atomic vectors

################################### RESULTS ####################################

Filter short ld threshold: mac Number of individuals / strata / chrom / locus / SNP: Before: 25 / 2 / 339 / 23310 / 51363 Blacklisted: 0 / 0 / 0 / 0 / 28053 After: 25 / 2 / 339 / 23310 / 23310

Do you want to continue filtering using long distance ld ? (y/n): n

Computation time, overall: 345 sec ############################# completed filter_ld ##############################

Sex-ratio (F/M): 0.56 Error in SeqArray::seqGetData(data, "annotation/format/DP")$data : $ operator is invalid for atomic vectors

Computation time, overall: 497 sec ############################ completed sexy_markers ############################

when I use the commands

data <- read_vcf(data = "/populations.snps.vcf", strata = "/info_sex.txt") strata <- read.delim("/info_sex.txt") radiator::summary_strata(strata)

Then the command

sex.markers <- radiator::sexy_markers( data = data, strata = strata)

I tested also the command

data <- read_vcf(data = "/populations.snps.vcf") strata <- read.delim("/info_sex.txt") radiator::summary_strata(strata)

Then the command

sex.markers <- radiator::sexy_markers( data = data, strata = strata)

but I got the same error.

My strata looks like

INDIVIDUALS STRATA IND-1 POP_1 IND-2 POP_2 IND-3 POP_1 IND-4 POP_2

Can anyone help me to solve this issue please?

thierrygosselin commented 6 months ago

Could you send by email the necessary files to reproduce the error please.

thierrygosselin commented 6 months ago

Ok thanks So I usually run a couple of analysis before using other functions in radiator...

test1 <- radiator::read_vcf(data = "populations.snps.vcf", strata = "info_sex.txt") By default...

Here are a few things that I wanted to bring to your attention.

  1. duplicated SNPs on different strands (+/-)
Detected 1536 duplicate SNPs on different strands (+/-)
    By default radiator prune those SNPs
    To change this behavior, use argument: filter.strands
Filter duplicated markers on different strands: blacklist
Number of individuals / strata / chrom / locus / SNP:
    Before: 27 / 2 / 342 / 24642 / 58873
    Blacklisted: 0 / 0 / 0 / 373 / 1536
    After: 27 / 2 / 342 / 24269 / 57337
  1. monomorphic markers

For info on the internal function do this in RStudio or R console: ?radiator::filter_monomorphic

Now the numbers are not small...

Filter monomorphic markers
Number of individuals / strata / chrom / locus / SNP:
    Blacklisted: 0 / 0 / 2 / 839 / 5369

Which begs the question... Did you filter your dataset before attempting the sexy_markers function ? Because this function goes deeper into the dataset but requires a clean/filtered dataset otherwise you find anything really of interest.

  1. common markers between groups/strata

By default theradiator::read_vcf function keeps only common markers but for certain analysis you might be interested in keeping those. Here, you have 631 SNPs not in common. If you look at the folder generated by the read function you will see a folder named something like that where the first 2 digits might be different 04_filter_common_markers. Inside this is a UpSetR figure showing you have 51337 SNPs in common, 468 SNPs only found in males and 163 SNPs only found in females.

You might want to double check if this is artifactual of your dataset, filtering etc. Did you use this strata file to generate the VCF in Stacks ?

The markers info is found in the file starting with this: blacklist.not.common.markers

Filter common markers:
Number of individuals / strata / chrom / locus / SNP:
    Blacklisted: 0 / 0 / 1 / 83 / 631

This alone could be your answer to distinguish males/females

  1. duplicate individuals

Did you keep in the dataset duplicated individuals or technical replicates ? Because these samples can impact analysis ... a lot.

Run this:

check1 <- radiator::detect_duplicate_genomes(data = test1)

Look at the figures it generated.

In your strata we see

F1_MEM203 and F2_MEME203

I thought at first these were the same sample but if you open the file individuals.pairwise.dist.tsv you see inside it's for example: F2-MEM198 and F2-MEM207 that are very very close. At the point that if you keep only 1 sample between identified duplicates, and remove only the sample with more missing genotypes, you will end up less than half the sample size (blacklisting 16 samples). 11 samples remaining with 4 females and 7 males...

You might want to check this before going further in your analysis.