lvclark / polyRAD

Genotype Calling with Uncertainty from Sequencing Data in Polyploids πŸŒπŸ“πŸ₯”πŸ πŸ₯
24 stars 8 forks source link

Error running Hind/He script in R #37

Closed PauloBaleeiro closed 1 year ago

PauloBaleeiro commented 1 year ago

Hello Lindsay,

First of all thank you for your effort putting all these together.

I study a group of aquatics Eriocaulon, and just recently we found out we might have a mix of diploid and polyploid depending the population and one specific case where I might have one pop with different ploidies.

I have been reading your recent paper on Hind/He and I tried running the script. Unfortunately I have been unable to run it, as a constant error pops up. I've ran the following command and got the following error:

myRAD <- VCF2RADdata("populations.haps.vcf",

I've looked at this page for answers but I couldn't work it out. https://support.bioconductor.org/p/9137213/

I'm doing a De novo analysis on Stacks program, and I understood I needed to use the haplotype.vcf file from Stacks' folder, after a De novo analysis is finished. If there is anything you could tell me or point me in the right direction, I very much appreciate it.

Thanks very much for your time.

lvclark commented 1 year ago

Hi Paulo,

Just run library(VariantAnnotation) before running your script.

PauloBaleeiro commented 1 year ago

Unfortunately it is still not working. It might be the GRanges and IRanges. I'm using the zipfile cataloge.vcf file generated by Stacks, but I cannot work out where the problem is. I tried running without "which=...", but it didn't work either.

lvclark commented 1 year ago

What error are you getting?

PauloBaleeiro commented 1 year ago

Hi Lindsay,

This is the beginning of script followed by error: myRAD <- VCF2RADdata("catalog.alleles.tsv.gz",

lvclark commented 1 year ago

Oh, you can't read a Stacks file with VCF2RADdata. There is a different function called readStacks.

On Sat, Apr 15, 2023, 1:14 AM Paulo Baleeiro @.***> wrote:

Hi Lindsay,

This is the beginning of script followed by error: myRAD <- VCF2RADdata("catalog.alleles.tsv.gz",

-

                svparam = ScanVcfParam(fixed = "ALT", info = NA, geno = "AD",

-

                                       which = GRanges("06", IRanges(1, 937644))),

-

                yieldSize = NA)

Error in h(simpleError(msg, call)) : error in evaluating the argument 'object' in selecting a method for function 'samples': [internal] _hts_rewind() failed

β€” Reply to this email directly, view it on GitHub https://github.com/lvclark/polyRAD/issues/37#issuecomment-1509514623, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADG72KBEXKNKWG37HTUKZS3XBIVCNANCNFSM6AAAAAAW4NNFDA . You are receiving this because you commented.Message ID: @.***>

PauloBaleeiro commented 1 year ago

Hi Clark,

I'm a bit stuck in the "object" to be used. In one of your scripts you show the following path: VCF2RADdata(file, phaseSNPs = TRUE, tagsize = 80, refgenome = NULL, tol = 0.01, al.depth.field = "AD", min.ind.with.reads = 200, min.ind.with.minor.allele = 10, possiblePloidies = list(2), taxaPloidy = 2L, contamRate = 0.001, samples = VariantAnnotation::samples(VariantAnnotation::scanVcfHeader(file)), svparam = VariantAnnotation::ScanVcfParam(fixed = "ALT", info = NA, geno = al.depth.field, samples = samples), yieldSize = 5000, expectedAlleles = 5e+05, expectedLoci = 1e+05, maxLoci = NA)

I'm not sure which VCF files to use. Would it be "populations.snps.vcf? or "populations.haps.vcf"? And what about the catalog.alleles.tsv? I have tried many ways, but none worked.

Cheers

lvclark commented 12 months ago

Hi Paulo,

That looks like the usage section of the VCF2RADdata documentation. The argument defaults are shown, and you don't have to type the arguments if you are okay with the defaults.

I haven't used Stacks in a long time. If your VCF looks like this: https://github.com/galaxyproject/tools-iuc/blob/main/tools/stacks2/test-data/populations/populations.haps.vcf it won't work because the AD field is not included. However one like this should work: https://github.com/galaxyproject/tools-iuc/blob/main/tools/stacks2/test-data/populations/populations.snps.vcf

Your code (in its simplest version) would look like:

mydata <- VCF2RADdata("populations.snps.vcf")

You don't need catalog.alleles.tsv if you are using VCF2RADdata. That file is only needed for readStacks.

Best,

Lindsay

PauloBaleeiro commented 12 months ago
Screenshot 2023-07-11 at 1 06 59 pm

This is mine. I'll try once again. Thanks

lvclark commented 12 months ago

Should be okay. I was concerned about the missing AD field for missing genotypes (./.) but did a quick test and polyRAD handles those appropriately. If you continue to have issues, please provide your code and a small dataset so that I can reproduce the problem on my own machine.

lvclark commented 12 months ago

That being said, you will have to lower the values for the min.ind.with.reads and min.ind.with.minor.allele arguments to VCF2RADdata since you have so few samples. polyRAD's genotyping algorithms may perform poorly on a sample size this small but you can use the naΓ―ve protocol shown in https://lvclark.r-universe.dev/articles/polyRADtutorials/population_genetics.html (just search the page for mydataNaive and follow that protocol for genotyping).