Open MattHuff opened 5 years ago
I'll do files 681-1360
I'll do 1361-2040 (1361-1700 & 1701-2040)
I've got 2041-2720 for cds Swissprot
#PBS -N casey_swissprot_1
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -t 2041-2720
#PBS -l nodes=1:ppn=2
#PBS -l walltime=08:00:00
cd $PBS_O_WORKDIR
module load blast
blastx \
-query /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/Matt_fasta_012219/splits_mRNA/mRNA.fasta.$PBS_ARRAYID \
-db /lustre/haven/gamma/staton/library/uniprot/uniprot_sprot.fasta \
-out /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/cricha59/blast/swissprot/mRNA_sprot.$PBS_ARRAYID.xml \
-evalue 1e-5 \
-outfmt 5
TrEMBL for 2041-2420
#PBS -N casey_trembl_1
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -t 2041-2420
#PBS -l nodes=1:ppn=2
#PBS -l walltime=15:00:00
cd $PBS_O_WORKDIR
module load blast
blastx \
-query /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/Matt_fasta_012219/splits_mRNA/mRNA.fasta.$PBS_ARRAYID \
-db /lustre/haven/gamma/staton/library/uniprot/uniprot_trembl_plants_July_2018.fasta \
-out /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/cricha59/blast/trembl/mRNA_sprot.$PBS_ARRAYID.xml \
-evalue 1e-5 \
-outfmt 5
TrEBML 2421-2720
#PBS -N casey_trembl_2
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -t 2421-2720
#PBS -l nodes=1:ppn=2
#PBS -l walltime=15:00:00
cd $PBS_O_WORKDIR
module load blast
blastx \
-query /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/Matt_fasta_012219/splits_mRNA/mRNA.fasta.$PBS_ARRAYID \
-db /lustre/haven/gamma/staton/library/uniprot/uniprot_trembl_plants_July_2018.fasta \
-out /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/cricha59/blast/trembl/mRNA_sprot.$PBS_ARRAYID.xml \
-evalue 1e-5 \
-outfmt 5
For IPS, I am doing files 1-200. The code I used is as follows:
#PBS -N autoanno_matt_ips
#PBS -A ACF-UTK0011
#PBS -S /bin/bash
#PBS -t 1-200
#PBS -j oe
#PBS -l nodes=1:ppn=4
#PBS -l walltime=3:30:00
cd $PBS_O_WORKDIR
/lustre/haven/gamma/staton/software/interproscan-5.28-67.0/interproscan.sh \
-i /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/Matt_fasta_012219/splits_polypeptide/polypeptide.fasta.$PBS_ARRAYID \
-f XML \
-d /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/Matt_fasta_012219/IPS/xmls \
--disable-precalc \
--iprlookup \
--goterms \
--pathways \
--tempdir /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/Matt_fasta_012219/IPS/TMP \
> /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/Matt_fasta_012219/IPS/TMP/$PBS_ARRAYID.out
I will take IPS 201-400
PBS -N casey_ips
#PBS -A ACF-UTK0011
#PBS -S /bin/bash
#PBS -t 201-400
#PBS -j oe
#PBS -l nodes=1:ppn=4
#PBS -l walltime=3:30:00
cd $PBS_O_WORKDIR
/lustre/haven/gamma/staton/software/interproscan-5.28-67.0/interproscan.sh \
-i /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/Matt_fasta_012219/splits_polypeptide/polypeptide.fasta.$PBS_ARRAYID \
-f XML \
-d /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/cricha59/blast/ips/xmls \
--disable-precalc \
--iprlookup \
--goterms \
--pathways \
--tempdir /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/cricha59/blast/ips/tmp \
> /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/cricha59/blast/ips/tmp/$PBS_ARRAYID.out
I'll do 401--600
Ill do IPS 601-800, actually ill go ahead and finish up 801-999 also
PBS -N casey_ips
#PBS -A ACF-UTK0011
#PBS -S /bin/bash
#PBS -t 601-800
#PBS -j oe
#PBS -l nodes=1:ppn=4
#PBS -l walltime=3:30:00
cd $PBS_O_WORKDIR
/lustre/haven/gamma/staton/software/interproscan-5.28-67.0/interproscan.sh \
-i /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/Matt_fasta_012219/splits_polypeptide/polypeptide.fasta.$PBS_ARRAYID \
-f XML \
-d /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/cricha59/blast/ips/xmls \
--disable-precalc \
--iprlookup \
--goterms \
--pathways \
--tempdir /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/cricha59/blast/ips/tmp \
> /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/cricha59/blast/ips/tmp/$PBS_ARRAYID.out
PBS -N casey_ips
#PBS -A ACF-UTK0011
#PBS -S /bin/bash
#PBS -t 801-999
#PBS -j oe
#PBS -l nodes=1:ppn=4
#PBS -l walltime=3:30:00
cd $PBS_O_WORKDIR
/lustre/haven/gamma/staton/software/interproscan-5.28-67.0/interproscan.sh \
-i /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/Matt_fasta_012219/splits_polypeptide/polypeptide.fasta.$PBS_ARRAYID \
-f XML \
-d /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/cricha59/blast/ips/xmls \
--disable-precalc \
--iprlookup \
--goterms \
--pathways \
--tempdir /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/cricha59/blast/ips/tmp \
> /lustre/haven/gamma/staton/projects/undergrads/automated_annotation/cricha59/blast/ips/tmp/$PBS_ARRAYID.out
IPS is done; what is the status on sprot and trembl? I want to have this done by the end of the week. Has anyone run BLAST on the last 680 files?
I'll do the last 680 (2721-3400)
I believe all the files are done. A few files (3383-3400) have no data associated with it for both trembl and swissprot. This may be due to the issue related to Juglans Cathayensis.
It looks like we split the files two many times, so the last 17 files are empty regardless of any errors.
Any progress on this?
Currently copying all remaining files to the dev server. It's taking awhile, because there are so many files, and the Trembl xml outputs, in particular, take forever to fully finish. Do you think memory will be an issue for continuing this? I know one of our servers recently hit its memory limit, and I wasn't able to continue loading the XMLs until it was resolved.
I'll update this post once all files are finished loading.
When the XMLs are ready, we can upgrade our storage to handle it. However, in the time being, let's keep everything on staton servers if possible.
I've copied the mRNA and polypeptide files generated by the current Automated Annotation protocol to the acf, and they can be found in
/lustre/haven/gamma/staton/projects/undergrads/automated_annotation/fasta_012219
. I have split the mRNA CDS file into 3400 separate files, and the protein file in to 1000 files.Going forward, I believe the best option is for each of us to run BLAST on 680 of the 3400 mRNA files. I chose to split this job into two commands, running the first 340 as one job and the second half as its own job. For IPS, use a similar strategy.
Here is my sample code for running swissprot BLAST: