stevemussmann / BayesAss3-SNPs

Modification of BayesAss 3.0.4 to allow handling of large SNP datasets
GNU General Public License v3.0
15 stars 7 forks source link

Filter loci in input file #8

Closed deleod closed 3 years ago

deleod commented 3 years ago

Hi, do you have any suggestions (or scripts) on how to remove loci where data is missing for all individuals. I am having an issue running the program with my input file and believe this is the culprit. `>BA3-SNPS -v -i100000000 -b1000000 -t -g -u -l 21292 -F wgenome_20_BA3.txt

               BA3-SNPS Version 1.1 (BA3-SNPS)                  
                    Released: 07/11/2019                        
                      Steven Mussmann                           
     Department of Biological Sciences at U. of Arkansas        

             Modified from BayesAss Version 3.0.4               
                        Bruce Rannala                           
       Department of Evolution and Ecology at UC Davis          

Please cite: Wilson & Rannala (2003). Bayesian Inference of recent
migration rates using multilocus genotypes. Genetics 163:1177-1191.

Please also cite: Mussmann, Douglas, Chafin, & Douglas (2019). BA3- SNPs: Contemporary migration reconfigured in BayesAss for next-
generation sequence data. Methods in Ecology and Evolution.

Made new Indiv object Going to read input file Setting alleles pop_1 0 Read input file

At least one locus may contain no data for all samples in your input file. gsl: ../gsl/gsl_rng.h:200: ERROR: invalid n, either 0 or exceeds maximum value of generator Default GSL error handler invoked. Abort trap: 6 `

stevemussmann commented 3 years ago

If you have a copy of your SNP data file in a VCF format, that can easily be accomplished with VCFtools or BCFtools. I'm not aware of anything that filters the immanc format used by BayesAss.

deleod commented 3 years ago

Ok, thanks. I do have a vcf file so I will look into VCFtools to filter and try re-running.

I was hoping to find a way to edit the structure file directly, so that I can then convert to immanc using the pyradStr2immanc.pl script.

stevemussmann commented 3 years ago

Yeah, sorry, I'm not aware of anything that filters Structure files directly.

deleod commented 3 years ago

Apprecaite the tip! Will attempt to do this with the vcf file if needed. Not sure how straightforward it is to convert vcf to immanc though for Bayesass?

May have found a work around for filtering the structure file directly in R using the poppr package, missingno function. Will report back.

On Wed, May 19, 2021, 3:41 PM Steve Mussmann @.***> wrote:

Yeah, sorry, I'm not aware of anything that filters Structure files directly.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stevemussmann/BayesAss3-SNPs/issues/8#issuecomment-844413184, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIT7WWARS6DSAMFA6NTRNSTTOQH6JANCNFSM45FHFMLQ .

stevemussmann commented 3 years ago

Usually I end up going through an intermediate format or two. There's a vcf2phylip converter (https://github.com/edgardomortiz/vcf2phylip), then I convert to structure (https://github.com/stevemussmann/file_converters/blob/master/phy2str.pl) and finally use the pyradStr2immanc.pl converter. It's far from the most efficient thing, but I'm not aware of any converters that would go direct from VCF to immanc. Maybe PGDSpider, but I've had mixed experiences with that program.

deleod commented 3 years ago

Thanks for the tip! I will check this out.

I was able to filter the structure file in R, after importing as a genind obj (adegenet package) using poppr/missingno, and export back to a structure file using an R function genind2structure written by Clark 2015 ( https://github.com/lvclark/R_genetics_conv/blob/master/genind2structure.R) though it did take a while to write the file. I then converted with your pyradStr2immanc.pl script instead of PGDSpider.

However, I wasn't able to get to successfully run with the filtered snp file. I am now getting a a new Segmentation Fault error, any ideas?

'Made new Indiv object Going to read input file Setting alleles pop1 0 Segmentation fault: 11'

On Wed, May 19, 2021 at 6:59 PM Steve Mussmann @.***> wrote:

Usually I end up going through an intermediate format or two. There's a vcf2phylip converter (https://github.com/edgardomortiz/vcf2phylip), then I convert to structure ( https://github.com/stevemussmann/file_converters/blob/master/phy2str.pl) and finally use the pyradStr2immanc.pl converter. It's far from the most efficient thing, but I'm not aware of any converters that would go direct from VCF to immanc. Maybe PGDSpider, but I've had mixed experiences with that program.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stevemussmann/BayesAss3-SNPs/issues/8#issuecomment-844552453, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIT7WWA3J3G2Y4R2HEUO6PDTOQ7EBANCNFSM45FHFMLQ .

stevemussmann commented 3 years ago

In my experience segmentation faults usually result from specifying an incorrect number of loci. If you converted to immanc with my perl script, you should be able to check the number of loci using awk '{print $3}' filename.txt | sort | uniq | wc -l and replacing "filename.txt" with the name of your file.

deleod commented 3 years ago

Ahh, not sure how I missed this. Thank you!

On Thu, May 20, 2021 at 3:30 PM Steve Mussmann @.***> wrote:

In my experience segmentation faults usually result from specifying an incorrect number of loci. If you converted to immanc with my perl script, you should be able to check the number of loci using awk '{print $3}' filename.txt | sort | uniq | wc -l and replacing "filename.txt" with the name of your file.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stevemussmann/BayesAss3-SNPs/issues/8#issuecomment-845416319, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIT7WWEYVMNQDBWTZIOW4TTTOVPMRANCNFSM45FHFMLQ .