Open jungleblack007 opened 1 year ago
There is another question how to calculate GSI and label them to branches?
To use the agaricales_odb10
database you will have to download and process the ODB profiles into the form that UFCG pipeline can accept.
For this, please run the following commands on your system (this may take a while):
# Download and unzip the agaricales_odb10 database
wget -q "https://busco-data.ezlab.org/v4/data/lineages/agaricales_odb10.2020-08-05.tar.gz"
tar xzf agaricales_odb10.2020-08-05.tar.gz
gzip -d agaricales_odb10/refseq_db.faa.gz
# Prepare model and sequence databases for the UFCG pipeline
cd agaricales_odb10/
ls prfl/ | cut -d. -f1 > gene_list
sed -z 's/\n/,/g;s/,$/\n/' gene_list > gene_set
mkdir -p model/pro/ seq/pro/
cat gene_list | while read I; do cp prfl/$I.prfl model/pro/$I.hmm; grep -PA1 --no-group-separator "^>$I" refseq_db.faa > seq/pro/$I.fa; done
After running above, the following command will allow you to extract agaricales_odb10
set from your sequence(s):
ufcg profile --modelpath model/ --seqpath seq/ -s $(cat gene_set) -i /path/to/input -o /path/to/output <options>
For the second question, output of the ufcg tree
module includes a Newick file named concatenated_gsi_[N].nwk
, which is the very tree labeled with GSIs that you are looking for. [N]
will be the number of total genes that has been considered to calculate the indices.
wow, thank you for your detailed answer, it's so great! I am trying now.
For example, I want to use the agaricales_odb10 as reference database to pick single copy orthologs, how can I change the Fungi_odb10 to Agaricales_odb10?