stevemussmann / admixturePipeline

A pipeline that accepts a VCF file to run through Admixture
GNU General Public License v3.0
56 stars 19 forks source link

runEvalAdmixpy produces a warning on script end #18

Open giriarteS opened 5 months ago

giriarteS commented 5 months ago

When using runEvalAdmix.py everything functions fine except at the end I see the following output on stderr:

R[write` to console]: Warning messages:

R[write to console]: 1: 
R[write to console]: In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE,  :
R[write to console]: 

R[write to console]:  libraries ‘/usr/local/lib/R/site-library’, ‘/usr/lib/R/site-library’ contain no packages

R[write to console]: 2: 
R[write to console]: In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE,  :
R[write to console]: 

R[write to console]:  libraries ‘/usr/local/lib/R/site-library’, ‘/usr/lib/R/site-library’ contain no packages

R[write to console]: 3: 
R[write to console]: In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE,  :
R[write to console]: 

R[write to console]:  libraries ‘/usr/local/lib/R/site-library’, ‘/usr/lib/R/site-library’ contain no packages

The container runs on Ubuntu 22.04.4 LTS. I get a final correlation graph for each K without the admixture plot. Also, I would like to rotate the labels. I attached one of the final graphs.

16

stevemussmann commented 5 months ago

Hello,

The warnings are normal and can be ignored. They come from the evalAdmix R functions.

The correlation graph being plotted without the admixture plot is also normal. If you want to combine them it will need to be done in some kind of image manipulation software.

I don't currently have a method of rotating the labels. That is buried somewhere in the evalAdmix R functions (i.e., code from the evalAdmix package that I did not write). My plan is to eventually rewrite their plotting functions but it will be some time before I am able to do so. Currently I just relabel my plots manually in image manipulation software.

-Steve

giriarteS commented 5 months ago

Thanks. What about haploid organisms? The haploid flag should be implemented. admixture myInput.ped 3 --haploid="*"

Gloria

stevemussmann commented 5 months ago

Hi Gloria,

I think I have implemented the --haploid option correctly, but I have no haploid data files for testing.

There is now an -H / --haploid option implemented for admixturePipeline.py. It takes string data as input, so provide it whatever you would normally supply to the --haploid= option in admixture. For example, if you want something similar to the command you listed above (e.g., admixture myInput.ped 3 --haploid="*") then you would need to run the following command:

admixturePipeline.py -v example.vcf -m example_map.txt -k 2 -K 5 -R 8 -H "*"

I am pushing the new container to docker while typing this message, and it is about 50% done, so it should be ready to go soon. Once you see the container with the 3.2 tag here(https://hub.docker.com/r/mussmann/admixpipe/tags) show as being updated today (June 2, 2024) then you need to pull the updated container to your computer.

If the --haploid option doesn't function as intended, then I might have to ask you for an example file or two for testing.

-Steve

giriarteS commented 5 months ago

Hi Steve, I am running the new container, and it is working!!

admixture -j16 -s 707946 --cv=10 /app/data/fg.ped 2 --haploid="*"
admixture -j16 -s 707946 --cv=10 /app/data/fg.ped 2 --haploid="*"

admixture -j16 -s 917224 --cv=10 /app/data/fg.ped 2 --haploid="*"
admixture -j16 -s 917224 --cv=10 /app/data/fg.ped 2 --haploid="*"

Muchas Gracias!!!

Gloria

stevemussmann commented 5 months ago

I'm glad it's working,

-Steve

stevemussmann commented 5 months ago

I'm going to close this issue for now, but feel free to reopen this issue or open a new one if you run into problems.

giriarteS commented 5 months ago

Hi Steve, The haploid testing run just finished but when I try to get the best K I get the following error. Number of runs is not consistent between K's at /etc/perl/BestKByEvannoAccessor.pm line 304.

These were the commands used:

admixturePipeline.py -m popmap_pop.txt -v fg..vcf -n 16 -k 2 -K 20 -c 20 -R 20 -a 0.005 -t 2000 -H "*"
submitClumpak.py -p fg -M
distructRerun.py -a ./ -d clumpakOutput/ -k 2 -K 20
cvSum.py
submitClumpak.py -b

Gloria

stevemussmann commented 5 months ago

I would suggest checking your admixture output directory to see if you have different numbers of outputs per K. This could happen if you ran the pipeline multiple times in the same directory using different settings. It looks like from your admixture outputs should follow the format of fg.k_R, where k = the K value, and R = the replicate number. Numbering of replicates starts at 0, so the first replicate for k=2 should be something like fg.2_0 and the 20th should be fg.2_19.

The pipeline is going to prepare the results.zip file from all files in your directory that follow the fg.k_R naming pattern, so if you conducted earlier runs in the same directory that included K=1, or runs with K>20, then these will also be included in the zip file.

If you find files with values k<2, k>20, or R>19 in their file names, try deleting them and rezip the .Q files into a new results.zip archive with the command zip -r results.zip fg.*.Q. Then rerun the submitClumpak.py -b command. I think that will fix the problem. If it doesn't, my next solution (which I realize is suboptimal) would be to conduct a fresh run of the pipeline in a new directory on your computer.

giriarteS commented 4 months ago

Steve, All the files look ok, no extra files but missing ones in some repeats. I decided to run it again with all the cores and just took a couple hours and I changed the -c and -R to 10. Whenever I increase those values, I get the same error. Thank you!

stevemussmann commented 4 months ago

If you would be able to send me the results.zip file that is output by admixpipe, that could help me narrow down whether the error is coming from my code or CLUMPAK. You could post it here, or send to me at my gmail address (user name = smussmann).

If that doesn't help me narrow things down, then I would likely need a copy of your input files to try to replicate the issue, if you would be willing to share them.

stevemussmann commented 4 months ago

I ran commands on the example files that are similar to what you reported above for your run, and was unable to replicate the problem. I can't run with the -H option because I have no dataset to test it. The only other thing I really changed was the MAF filter, which I increased out of interest of speeding things up:

admixturePipeline.py -m example_map.txt -v example.vcf -n 8 -k 2 -K 20 -c 20 -R 20 -a 0.2 -t 2000
submitClumpak.py -p example -M
distructRerun.py -a ./ -d clumpakOutput/ -k 2 -K 20
cvSum.py
submitClumpak.py -b

Looking at your command again I noticed your file might be named fg..vcf? This is a bit of a longshot, but perhaps the double .. is causing some troubles in parsing of file names by clumpak?

Or something is being caused by the -H flag?

Lastly, it could be an issue being induced by something in your input files. Aside from those options I'm struggling to think of what could be causing this issue.