petrelharp / local_pca

Methods for examining PCA locally along the genome.
71 stars 13 forks source link

Is vcf_windower compatible with the last versions of bcftools? #16

Closed clairemerot closed 4 years ago

clairemerot commented 4 years ago

Hello, I have been using successfully lostruct, including the function to load bcf window by window with the following commands on the server of my university which has bcftools 1.8, and that I am loading with module load bcftools snps <- vcf_windower("capelin_NWA_sorted.bcf",size=100,type='snp', sites= vcf_positions("capelin_NWA_sorted.bcf"))

However, I am now trying to make it run on a AWS server in which we have install bcftools 1.9, and I am running into errors. I am using exactly the same I manage to twist the sites and samples by providing directly matrix and vector instead of using vcf_positions function, but then the function does not manage to proceed the windows and I get the error " Error. Is bcftools installed?\n"

Do you have any idea why that is? How to solve that? Is it a problem of version of bcftools? permissions? I am using exactly the same vcf/bcf files and the same code. This is for a course about adaptation genomics.

Thank you for your help Claire

petrelharp commented 4 years ago

Hm, good question. It looks like that error occurs in vcf_query - are you sure the problem is in vcf_positions? Oh, I see: the error happens when you call the function snps(), right? If so, you should be able ot get the bcftools call by passing verbose=TRUE, eg snps(1, verbose=TRUE), and then we'll see what the problem is?

petrelharp commented 4 years ago

And, gee - all my tests are working with bcftools 1.9. But it sounds like there's some new problem!

clairemerot commented 4 years ago

Thanks a lot for your help!!

So I manage to twist the vcf_windower by loading directly a list of position and samples (obtained by changing bcftools query -f '%CHROM\\t%POS\\n' file into bcftools query -f '%CHROM\ \t%POS\ \n' file

With that it manages to create the snps function but then it runs into the same problem that it can't access bcftools

snps(1, verbose=T) bcftools query -f '[ %GT]\n' -r Chr1:4598-102627 -s c(221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20) capelin_NWA_sorted.bcf Error in paste("Error. Is bcftools installed?\n", e) : object 'e' not found

Maybe it is a matter of admin right or not on our server if this works for you on bcftools 1.9 ?

clairemerot commented 4 years ago

hum I see the line bcf.args <- c("bcftools", "query", "-f", "'[ %GT]\n'") in vcf_query function. Could it be again a matter or adding a space before the \n ?

petrelharp commented 4 years ago

If you run that bcftools command in the command line, does it succeed?

Could it be again a matter or adding a space before the \n ?

I don't think a space there would affect things? But I am not well-versed in bcftools

petrelharp commented 4 years ago

And hm, I see that an error in that function was preventing the message being shown. I've fixed that: https://github.com/petrelharp/local_pca/commit/a7bc0b99be0e68b76d40d2a8f35df95fcc82a841 so if you install from github again you might get a more informative message.

clairemerot commented 4 years ago

yes! Seems it is that!! Without the space it just make a continuous file, while with a space ('[ %GT]\ \n') instead of '[ %GT]\n', it runs at least in command line. Now if I want R to succeed shall I try to modify the functions locally to add that space?

petrelharp commented 4 years ago

Oh! Sorry, I was being slow - now I see what you mean about the space! The two backslashes (e.g. \\n) are there to escape the backslash, since we actually want \n there, but R (sometimes?) replaces \n by a linebreak.

But, now I'm confused. Could you let me know what exactly you've done, and what output you get? Don't worry if it starts to get long - more output is better.

Now if I want R to succeed shall I try to modify the functions locally to add that space?

If you know how to fix the R package, then great! Please do, and tell me how!

petrelharp commented 4 years ago

Hm, after some testing I see that escaping those slashes is not doing what I expected.

I've made the changes you suggest over here: #17

Could you test this, by doing install_github("petrelharp/local_pca/lostruct", ref="no_escape") and re-running?

clairemerot commented 4 years ago

Sorry, I'm still running into the same problem

bcftools query -f '[ %GT] ' -r Chr1:298949-557448 -s 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240 capelin_NWA_sorted.bcf sh: 1: bcftools: not found Error in value[3L] : Error. Is bcftools installed? error in running command

I have tried to add those spaces in a cloned version of lostruct but it does not seem to solve the matter... I guess my R is not able to access and run bcftools. Why? I have no idea

clairemerot commented 4 years ago

Thanks a lot for having tried to help me. I'll see if any solution appears and let you know. Otherwise, I'll start the tutorial on lostruct with the already done matrix of eigen windows that I did on the other server. The good news is that if it works smoothly for you on bcftools 1.9, no more people should have trouble. I guess it is a local matter on that server of communication between R and bcftools. Many thanks!!

petrelharp commented 4 years ago

Ah-ha: probably, bcftools in in your PATH on the command line (probably, using bash?), but not in the shell that R opens, which is usually sh. fread( ) just calls system( ), so you should be able to check for fixes just by doing system("which bcftools", intern=TRUE). If all is working, it should tell you something like "/usr/local/bin/bcftools", but if it can't find bcftools, it'll throw some errors.

Glad I got those error messages more informative, anyhow! I'll close this; feel free to re-open if this issue reappears.

clairemerot commented 4 years ago

Yes, I don't know why bcftools is located in a miniconda folder! I'll check that with the person responsible for the installation but it seems we are now finding the solution!! thanks a lot for your time and advice Best, Claire

alxcmrn commented 1 year ago

Ah-ha: probably, bcftools in in your PATH on the command line (probably, using bash?), but not in the shell that R opens, which is usually sh. fread( ) just calls system( ), so you should be able to check for fixes just by doing system("which bcftools", intern=TRUE). If all is working, it should tell you something like "/usr/local/bin/bcftools", but if it can't find bcftools, it'll throw some errors.

I am running into this issue where R is launching a different shell and can't find my installation of bcftools. I can't figure out how to get R to look in the correct place for bcftools and any advice would be greatly appreciated.

I use bash and bcftools has been added to $PATH

Cheers, -Alex

petrelharp commented 1 year ago

Well, let's see: by doing Sys.getenv("PATH") you can see what $PATH is in the shell that R is using. More generally I suggest googling how PATH is set by R, e.g. https://stackoverflow.com/questions/43882307/where-should-i-set-the-variable-path-in-r -- good luck!