tamsen / Pisces

Somatic and germline variant caller for amplicon data. Recommended caller for tumor-only workflows.
GNU General Public License v3.0
7 stars 0 forks source link

Creating a GenomeSize.xml for a soybean genome #14

Closed brunabavelino closed 9 months ago

brunabavelino commented 2 years ago

Dear @tamsen We are having a bad time here trying to understand how to make a GenomeSize.xml for the soybean genome assembly 2 (https://data.jgi.doe.gov/refine-download/phytozome?organism=Gmax&expanded=275). Our intention is to use it for running the DNA Amplicom module of the Local Run Manager v3 for SNP calling. Is there any simple way to make this file? ;D Bruna

tamsen commented 2 years ago

Hi! Thanks for your interest in Pisces. Its so exciting to hear its involved in the soybean work :)

1) Yes, you can use the CreateGenomeSizeFile executable to generate that file. See https://github.com/tamsen/Pisces/wiki/Pisces-Tools-5.3.0

2) You can get latest binaries here: https://github.com/tamsen/Pisces/releases/tag/v5.3.0.0

3) Here's an example command for a plant genome I was recently working with

old dotnet version:

dotnet ~/PiscesBinaries/CreateGenomeSizeFile_5.2.10.49/CreateGenomeSizeFile.dll -g /home/tamsen/Genomes/Solanum_chau -s "Solanum Chaucha (NCBI build_v1)"

new linux-friendly version would be like:

./Pisces_5.3.0.0/CreateGenomeSizeFile -g /home/tamsen/Genomes/Solanum_chau -s "Solanum Chaucha (NCBI build_v1)"

best, Tamsen

brunabavelino commented 2 years ago

Hi, I'm glad you answered me. In this case, I'm just using the Williams 82 reference genome to help me with the analysis. Could you help me to use Pisces? In this case, the steps would be: 1) I download the file "pisces_all_5.3.0.0.tar.gz", available from "https://github.com/tamsen/Pisces/releases/tag/v5.3.0.0" and unzip the file (in my case it is in the downloads folder); 2) I open the terminal and put the following command in new linux-friendly version: ./Pisces_5.3.0.0/CreateGenomeSizeFile -g /home/tamsen/Genomes/Solanum_chau -s "Solanum Chaucha (NCBI build_v1)" Where: "home/tamsen/Genomes" would be the location of the file that I want to create the GenomeSize.xml and "/Solanum_chau" is the name of the file, in which it is not necessary to put its format (in my case it would be ".fa") What is Solanum Chaucha (NCBI build_v1)", is the file description? 3) Adjusting the command for my file, it would look like this, as my ".fa" file is in the documents folder: ./Pisces_5.3.0.0/CreateGenomeSizeFile -g /home/bioinfo/Documentos/Gmax_275_v2.0 -s "Williams 82 (Phytozome.v13 Wm82.a2.v1)"

However, when running, it puts it as "Nonexistent file or directory". Sorry for my lack of computing experience. Do I need to install Pisces_5.3.0.0 first? I'm doing something wrong I don't know what it could be. Would you help me?

Thanks! Bruna

tamsen commented 2 years ago

Hi there,

I think you are close. First, are you on a unix based system? I would recommend not leaving the binaries in your downloads folder. Suppose you move them to "/Shared/Pisces" and unzipping your download in there. Then you would have something like "/shared/Pisces/Pisces_5.3.0" which would be full of the files you extracted, many ending in ".dll". In that folder would be a file called "CreateGenomeSizeFile".

So, go to the folder CreateGenomeSizeFile is in, and type "./CreateGenomeSizeFile" at the cmd prompt. That will test you can run the executable and should pop up the help if it works, printing out all the available input options.

Next, you need to try it with the full command: ./CreateGenomeSizeFile -g [full path to folder with your .fa file] -s "Williams 82 (Phytozome.v13 Wm82.a2.v1)"

if you still have an error, you might need to make an .fai file with samtools first.

best Tamsen

brunabavelino commented 2 years ago

That's cool, I think it worked this time. I'm using a Linux distribution (Fedora release 35). In this case, I moved the folder "Pisces_5.3.0.0" which would be full of the files you extracted, many ending in ".dll", inside "/home/bioinfo". Then I entered the "Pisces_5.3.0.0" folder and tested the "./CreateGenomeSizeFile" in the terminal and the information about the "Pisces Software" appeared. So I used the full command, which in my case looked like this: ./CreateGenomeSizeFile -g /home/bioinfo/Documents -s "Williams 82 (Phytozome.v13 Wm82.a2.v1)", where [full path to the folder with your .fa file] is "/home/bioinfo/Documents ". One of the things I was doing wrong was putting the file name, but it's not accurate. At the end it generated in the directory "/home/bioinfo" which is where the unzipped folder "Pisces_5.3.0.0" is, three files: "GenomeSize.xml", "Gmax_275_v2.0.dict" and "Gmax_275_v2.0. fa.fai". I had previously checked and identified how to generate the ".fai" file using samtools, but I tested the command to generate the GenomeSize.xml and it was not necessary to have the ".fai" file and it still generated this file in the end. Is my result and the files that were generated correct?

Thanks! Bruna

tamsen commented 2 years ago

Yes! it sounds like it worked! You can always open the GenomeSize.xml with any text editor and visually check it.

There should also have been a "CreateGenomeSizeFileLogs" folder created. If you are in any doubt, you can read through those log files and see what the program did.

brunabavelino commented 2 years ago

Hello Dear Tamsen I just checked the files and in fact, it actually creates this folder "CreateGenomeSizeFileLogs" and inside it there are two files: "CreateGenomeSizeFileLog.txt" and "CreateGenomeSizeFileOptions.used.json". It is very good and gratifying to know that it worked and that it worked. The only thing I changed in the code was where I took the genome from, to make it correct to be "Genus Species (Source Build)". Looking like this ./CreateGenomeSizeFile -g /home/bioinfo/Documents -s "Glycine max (JGI Wm82.a2.v1)". Thank you so much for your help and patience. Hugs. Bruna