rystanley / genepopedit

Simple and flexible manipulation of genomic data.
15 stars 4 forks source link

Input genepop with comma sep loci on single line not working :( #7

Closed denisroy1 closed 6 years ago

denisroy1 commented 6 years ago

Hi, I'm trying to use these scripts to filter out specific SNPs from a series of ~ 4000. I have a genepop file formatted as follows:

fcsnp10_99.txt

The scripts seemingly read in the data correctly, but clearly something is wrong with the data format as it cannot read the PopNames, PopCounts, or the LociNames correctly. If you have some time, do you mind taking a look at the file and script. Thanks alot for any help. The script I used is below:

rm(list=ls())

re-setting all instances

library(genepopedit)

setwd("path to my file") infile<-file.choose() outd<-getwd()

gp210_99 <- read.table(infile,sep="\t",quote="",stringsAsFactors=FALSE)

PopNames <- genepop_detective(gp210_99,variable="Pops") PopCounts <- genepop_detective(gp210_99, variable="PopNum") LociNames <- genepop_detective(gp210_99,variable="Loci")

subloci<-c("loci_18", "loci_19", "loci_285", "loci_431", "loci_492", "loci_629", "loci_697", "loci_869", "loci_949", "loci_1072", "loci_1124", "loci_1135", "loci_1294", "loci_1295", "loci_1670", "loci_1671", "loci_1860", "loci_1881", "loci_2056" , "loci_2201", "loci_2220", "loci_2250", "loci_2269", "loci_2357", "loci_2437" , "loci_2588", "loci_2755", "loci_2756", "loci_2945", "loci_2948", "loci_2995" , "loci_3105", "loci_3267", "loci_3306", "loci_3333", "loci_3356", "loci_3498" , "loci_3510", "loci_3616", "loci_3692", "loci_3693", "loci_3703", "loci_3766" , "loci_3796", "loci_3820", "loci_3848")

subset_genepop(genepop=gp210_99, keep = F, subs = subloci, path = paste0(output_dir,"/","fcsnp10_99_neut.txt"))

NickJeff13 commented 6 years ago

Hi, Genepopedit tends to like Genepop files with the loci in a vertical column under your heading, as opposed to flat across the file. I ran your input file through PGDSpider from Genepop to Genepop and it worked with Genepopedit. I've attached the fixed file here.

NewGP_Fixed.txt

Note that you want keep=T in subset Genepop if you want to keep just that vector of loci you added, and when I set keep=T I noticed that 'loci_2201' is not in your genepop file. If you want just those ~45 loci removed, keep should =F. Also it is best not to read a Genepop file in as read.table as you did, just allow genepopedit's functions to read in the genepop file in their own way. Here is my script for your file:

setwd("to whatever you want") library(genepopedit)

Converted your genepop file in PGDSpider to "NewGP_Fixed.txt"

subloci<-c("loci_18", "loci_19", "loci_285", "loci_431", "loci_492", "loci_629", "loci_697", "loci_869", "loci_949", "loci_1072", "loci_1124", "loci_1135", "loci_1294", "loci_1295", "loci_1670", "loci_1671", "loci_1860", "loci_1881", "loci_2056" , "loci_2220", "loci_2250", "loci_2269", "loci_2357", "loci_2437" , "loci_2588", "loci_2755", "loci_2756", "loci_2945", "loci_2948", "loci_2995" , "loci_3105", "loci_3267", "loci_3306", "loci_3333", "loci_3356", "loci_3498" , "loci_3510", "loci_3616", "loci_3692", "loci_3693", "loci_3703", "loci_3766" , "loci_3796", "loci_3820", "loci_3848")

loci<-genepop_detective("NewGP_Fixed.txt",variable="Loci")

setdiff(subloci,loci)

subset_genepop(genepop = "NewGP_Fixed.txt",subs=subloci,keep = T,path="C:/Users/JefferyN/Desktop/NewGP2.txt")

Hopefully that helps!

denisroy1 commented 6 years ago

Hi Nick,

Thanks for looking at this for me so quickly and resolving the issue. If I process the file in PGDSpider using genepop->genepop I can enter and use genepopedit. This is great. The only issue I still have is trying to write the output to the "GENETIX" file format using the PGDspideR cmd.

PGDspideR(input = nfn, input_format="GENEPOP", output = ffn, output_format="GENETIX", spid="/Users/Denis/Documents/genpop-genetix.spid", where.pgdspider="/Genetic Programs/PGDSpider_2.1.0.2/")

I've triple-quadruple checked and these are the correct paths for my files. The error messages are below:

The system cannot find the path specified. Process Completed. Warning messages: 1: In file.copy(from = spid, to = where.pgdspider, overwrite = TRUE) : problem copying \Users\Denis\Documents\genpop-genetix.spid to \Genetic Programs\PGDSpider_2.1.0.2\genpop-genetix.spid: Permission denied 2: running command 'C:\Windows\system32\cmd.exe /c cd /Genetic\ Programs/PGDSpider_2.1.0.2/ && java -Xmx1024m -Xms512m -jar PGDSpider2-cli.jar -inputfile C:\Users\Denis\Dropbox\Rhi\Rhi-2015\genomics\SNPanalyses\forcasting\genomepop2\gen1\fcsnp1_01fpn.txt -inputformat GENEPOP -outputfile C:\Users\Denis\Dropbox\Rhi\Rhi-2015\genomics\SNPanalyses\forcasting\genomepop2\gen1\fcsnp1_01fpn.gtx -outputformat GENETIX -spid genpop-genetix.spid' had status 1 3: In shell(run.PGDspider) : 'cd /Genetic\ Programs/PGDSpider_2.1.0.2/ && java -Xmx1024m -Xms512m -jar PGDSpider2-cli.jar -inputfile C:\Users\Denis\Dropbox\Rhi\Rhi-2015\genomics\SNPanalyses\forcasting\genomepop2\gen1\fcsnp1_01fpn.txt -inputformat GENEPOP -outputfile C:\Users\Denis\Dropbox\Rhi\Rhi-2015\genomics\SNPanalyses\forcasting\genomepop2\gen1\fcsnp1_01fpn.gtx -outputformat GENETIX -spid genpop-genetix.spid' execution failed with error code 1 4: In file.remove(paste0(where.pgdspider, input.name)) : cannot remove file '/Genetic Programs/PGDSpider_2.1.0.2/C:\Users\Denis\Dropbox\Rhi\Rhi-2015\genomics\SNPanalyses\forcasting\genomepop2\gen1\fcsnp1_01fpn.txt', reason 'Invalid argument' 5: In file.remove(paste0(where.pgdspider, output.name)) : cannot remove file '/Genetic Programs/PGDSpider_2.1.0.2/C:\Users\Denis\Dropbox\Rhi\Rhi-2015\genomics\SNPanalyses\forcasting\genomepop2\gen1\fcsnp1_01fpn.gtx', reason 'Invalid argument' 6: In file.remove(paste0(where.pgdspider, spid.name)) : cannot remove file '/Genetic Programs/PGDSpider_2.1.0.2/genpop-genetix.spid', reason 'Permission denied'

I guess I could do it by hand using the PGD interface, but I have 120 files to process and it would be better to push them through the program. This may be a lost cause, and I'll just ghave to bite the bullet.

In any case, thanks for getting to work for me, this is invaluable.

Best cheers!

NickJeff13 commented 6 years ago

Hi Denis,

pgdspideR in genepopedit worked for me using the new Genepop file I sent you. It is best to type out the full paths for both your input and output files which might be the problem, rather than using a shortcut (what you're calling nfn and ffn). Not sure which operating system you're using, but the following script works for me - I'll re-attach the fixed Genepop file. Your .spid file likely was made in Pgdspider and just has the SNP options selected I assume? Also your output should have .gtx extension for genetix format.

setwd("C:/Users/JefferyN/Desktop") library(genepopedit) PGDspideR(input = "NewGP_Fixed.txt",input_format = "GENEPOP",output = "C:/Users/JefferyN/Desktop/Genetix_Input.gtx",output_format = "GENETIX",spid = "C:/Users/JefferyN/Documents/GP_GTX.spid",where.pgdspider = "C:/Users/JefferyN/Documents/Programs/PGDSpider_2.0.8.3/")

NewGP_Fixed.txt

Hope that helps, good luck with your work.

rystanley commented 6 years ago

Adding to Nick's comments I think the issue might be associated with incomplete file paths: Try:

PGDspideR(input = nfn, input_format="GENEPOP", output = ffn, output_format="GENETIX", spid="C:/Users/Denis/Documents/genpop-genetix.spid", where.pgdspider="C:/Genetic Programs/PGDSpider_2.1.0.2/")

Note I am not sure where the folder 'Genetics Programs' is. If it is in C:/Users/Denis ... you will want to change that in the code. If errors propagate after this change then we can see if it is truly a permissions error or something else. Not having the .spid might cause the remaining cascade errors. Also if you can rename the folder 'Genetics Programs' to 'Genetics_Programs' it will probably help. CMD commands can have errors when with spaces in file paths.