statonlab / hardwoods_site

Hardwoods Genomics bugs, data loading, and general issues
GNU General Public License v3.0
2 stars 1 forks source link

Chestnut update #514

Open mestato opened 5 years ago

mestato commented 5 years ago

Publication and Data Information

We have a new version of the Chinese chestnut genome (so not actually a new organism, just a new reference genome with new annotation and new features).

This reference genome has two versions (v3.2 and v4.1). We will upload features from version 3.2 but have downloadable files from both. I am still working with @MattHuff on getting 4.1 files, so lets start with the 3.2 version files for now.

Files are on the staton server in /staton/projects/chestnut/psudochro:

3.2:

Castanea_mollissima_scaffolds_v3.2_cds.faa Castanea_mollissima_scaffolds_v3.2_cds.fna Castanea_mollissima_scaffolds_v3.2.fasta Castanea_mollissima_scaffolds_v3.2.gff

We should have a pub submitted soon, will try to remember to add that info here, but ask me again soon if I don't.

Additional Information

Checklist

See New Genome Documentation for detailed instructions.

mestato commented 5 years ago

Downloadable files:

CaseyRichards92 commented 5 years ago

Live site

Reference Genome: https://www.hardwoodgenomics.org/Genome-assembly/3032843 Swissprot Annotation https://www.hardwoodgenomics.org/BLAST-annotation/3032847 TrEMBL Annotations https://www.hardwoodgenomics.org/BLAST-annotation/3032848 IPS Annotations https://www.hardwoodgenomics.org/InterProScan-annotation/3032849 KEGG Annotations https://www.hardwoodgenomics.org/KEGGresults/3032850 CHADO FASTA Loader CDS https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/751481 CHADO FASTA Loader Protein https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/751482 Published mRNA-Polypeptide https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/751483 SwissProt Loader https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/751487 TrEMBL Loader https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/751488 IPS Loader https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/787565 BLAST DB transcripts https://www.hardwoodgenomics.org/content/castanea-mollissima-transcripts-0 BLAST DB scaffolds https://www.hardwoodgenomics.org/content/castanea-mollissima-scaffolds-0 BLAST DB peptides https://www.hardwoodgenomics.org/content/castanea-mollissima-peptides-0 KEGG Loader https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/753030

CaseyRichards92 commented 5 years ago

Swissprot

#PBS -N swissprot_BLAST
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -t 1-200
#PBS -l nodes=1:ppn=2
#PBS -l walltime=04:00:00

cd $PBS_O_WORKDIR

module load blast

blastx \
 -query /lustre/haven/gamma/staton/projects/undergrads/castanea_mollissima/splits_cds/Castanea_mollissima_scaffolds_v3.2_cds.fna.$PBS_ARRAYID \
 -db /lustre/haven/gamma/staton/library/uniprot/uniprot_sprot.fasta \
 -out /lustre/haven/gamma/staton/projects/undergrads/castanea_mollissima/blast/swissprot/uniprot_mollissima.$PBS_ARRAYID.xml \
 -evalue 1e-5 \
 -outfmt 5
CaseyRichards92 commented 5 years ago

Trembl

#PBS -N swissprot_BLAST
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -t 1-200
#PBS -l nodes=1:ppn=2
#PBS -l walltime=15:00:00

cd $PBS_O_WORKDIR

module load blast

blastx \
 -query /lustre/haven/gamma/staton/projects/undergrads/castanea_mollissima/splits_cds/Castanea_mollissima_scaffolds_v3.2_cds.fna.$PBS_ARRAYID \
 -db /lustre/haven/gamma/staton/libraries/uniprot/uniprot_trembl_plants_July_2018.fasta \
 -out /lustre/haven/gamma/staton/projects/undergrads/castanea_mollissima/blast/trembl/trembl_mollissima.$PBS_ARRAYID.xml \
 -evalue 1e-5 \
 -outfmt 5
CaseyRichards92 commented 5 years ago

IPS

#PBS -N mollissima_ips
#PBS -A ACF-UTK0011
#PBS -S /bin/bash
#PBS -t 1-200
#PBS -j oe
#PBS -l nodes=1:ppn=4
#PBS -l walltime=4:00:00

cd $PBS_O_WORKDIR

module load python3

/lustre/haven/gamma/staton/software/interproscan-5.34-73.0/interproscan.sh \
 -i /lustre/haven/gamma/staton/projects/undergrads/castanea_mollissima/splits_peptides/Castanea_mollissima_scaffolds_v3.2_cds.faa.$PBS_ARRAYID \
 -f XML \
 -d /lustre/haven/gamma/staton/projects/undergrads/castanea_mollissima/ips/xmls \
 --disable-precalc \
 --iprlookup \
 --goterms \
 --pathways \
 --tempdir /lustre/haven/gamma/staton/projects/undergrads/castanea_mollissima/ips/tmp \
 > /lustre/haven/gamma/staton/projects/undergrads/castanea_mollissima/ips/tmp/$PBS_ARRAYID.out
almasaeed2010 commented 5 years ago

@mestato for this, should we clear the old feature records from chado before we add the new ones?

mestato commented 5 years ago

Yes, but we need to add the old gene names as synonyms to the new gene names. so anyone searching for them can find the most up to date version. I'm working on this analysis now

mestato commented 5 years ago

Update: hold off on making files downloadable until publication is out. Posted ft.lauderdale agreement to this effect in the description.