healthyPlant / PhytoPipe

10 stars 1 forks source link

creation of the databases #3

Open Viola-TA opened 1 year ago

Viola-TA commented 1 year ago

Hello all, I would like to use PhytoPipe. I have already downloaded and installed the code (Ubuntu system). However, there is now a bit of a sticky problem with the creation of the databases.

I can understand /path/to/PhytoPipe/scripts/updateDatabase.sh and /path/to/software/PhytoPipe. But what about /path/to/database v25.0? Do I have to download the current database first? If so, which of the links on the RVDB database, protein version page? There are four links for each version. Or do I have to save the corresponding database for each individual tool (Diamond, NCBI, Kaiju, ...)?

Thanks in advance for feedback :)

Viola

xhu556 commented 12 months ago

Thank you for asking me! Viola

The script updateDatabase.sh takes three arguments: /path/to/phytopipe, /path/my/database, and rvdb_version, ex. v25.0. v25.0 is not a PhytoPipe database version. It's the RVDB version. The current RVDB version is v26.0. You can use v26.0 instead of v25.0 to run the script. Please visit https://rvdb-prot.pasteur.fr/ to check. The actual download file is https://rvdb-prot.pasteur.fr/files/U-RVDBv26.0-prot.fasta.xz, which is "Proteic sequences". You don't need to pre-download any files. The script takes care of them. After it's done, all databases files should be under the folder /path/my/database. I just put all commands under "Databases" section in PhytoPipe Wiki together for easily installing all databases.
The one issue is Kraken2 database building. If it's stuck, please follow the solution in Wiki to manually build it. The Kraken2 database building usually takes 2-3 days, which depends on your computer capability.

To check the progress or errors, please view your nohup.out file under the folder which you were running the command "nohup updateDatabase.sh" using more, less, head, tail commands. For example,

tail -100 /path/to/nohup.out

Or you can use grep command to check each step (from #1. to #11.) . For example,

grep -A10 “#1.” /path/to/nohup.out

If you see the following sentences at the end of nohup.out, "Database building has finished." "#*****" "Please update database paths in the config.yaml" ....

You have your database now. You can add your database paths/files in config.yaml.

If you have any questions, please feel free to send me an email at xiaojun.hu@usda.gov.

Best,

Alex