Open mestato opened 5 years ago
Downloadable files:
[ ] Chestnut 3.2 CDS - nucleotide transcripts (one per gene)
[ ] Chestnut 3.2 CDS - nucleotide transcripts (all)
[ ] Chestnut 3.2 CDS - aa
[ ] Chestnut 3.2 fasta file of contigs
[ ] Chestnut 3.2 gff
[ ] Chestnut 3.2 Excel of best BLAST Hits
[ ] Chestnut 3.2 Excel file of old chestnut gene models assigned to new chestnut gene models
[ ] Chestnut 4.1 CDS - nucleotide
[ ] Chestnut 4.1 CDS - aa
[ ] Chestnut 4.1 fasta file of contigs
[ ] Excel file of contig locations on LGs
Reference Genome: https://www.hardwoodgenomics.org/Genome-assembly/3032843 Swissprot Annotation https://www.hardwoodgenomics.org/BLAST-annotation/3032847 TrEMBL Annotations https://www.hardwoodgenomics.org/BLAST-annotation/3032848 IPS Annotations https://www.hardwoodgenomics.org/InterProScan-annotation/3032849 KEGG Annotations https://www.hardwoodgenomics.org/KEGGresults/3032850 CHADO FASTA Loader CDS https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/751481 CHADO FASTA Loader Protein https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/751482 Published mRNA-Polypeptide https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/751483 SwissProt Loader https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/751487 TrEMBL Loader https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/751488 IPS Loader https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/787565 BLAST DB transcripts https://www.hardwoodgenomics.org/content/castanea-mollissima-transcripts-0 BLAST DB scaffolds https://www.hardwoodgenomics.org/content/castanea-mollissima-scaffolds-0 BLAST DB peptides https://www.hardwoodgenomics.org/content/castanea-mollissima-peptides-0 KEGG Loader https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/753030
Swissprot
#PBS -N swissprot_BLAST
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -t 1-200
#PBS -l nodes=1:ppn=2
#PBS -l walltime=04:00:00
cd $PBS_O_WORKDIR
module load blast
blastx \
-query /lustre/haven/gamma/staton/projects/undergrads/castanea_mollissima/splits_cds/Castanea_mollissima_scaffolds_v3.2_cds.fna.$PBS_ARRAYID \
-db /lustre/haven/gamma/staton/library/uniprot/uniprot_sprot.fasta \
-out /lustre/haven/gamma/staton/projects/undergrads/castanea_mollissima/blast/swissprot/uniprot_mollissima.$PBS_ARRAYID.xml \
-evalue 1e-5 \
-outfmt 5
Trembl
#PBS -N swissprot_BLAST
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -t 1-200
#PBS -l nodes=1:ppn=2
#PBS -l walltime=15:00:00
cd $PBS_O_WORKDIR
module load blast
blastx \
-query /lustre/haven/gamma/staton/projects/undergrads/castanea_mollissima/splits_cds/Castanea_mollissima_scaffolds_v3.2_cds.fna.$PBS_ARRAYID \
-db /lustre/haven/gamma/staton/libraries/uniprot/uniprot_trembl_plants_July_2018.fasta \
-out /lustre/haven/gamma/staton/projects/undergrads/castanea_mollissima/blast/trembl/trembl_mollissima.$PBS_ARRAYID.xml \
-evalue 1e-5 \
-outfmt 5
IPS
#PBS -N mollissima_ips
#PBS -A ACF-UTK0011
#PBS -S /bin/bash
#PBS -t 1-200
#PBS -j oe
#PBS -l nodes=1:ppn=4
#PBS -l walltime=4:00:00
cd $PBS_O_WORKDIR
module load python3
/lustre/haven/gamma/staton/software/interproscan-5.34-73.0/interproscan.sh \
-i /lustre/haven/gamma/staton/projects/undergrads/castanea_mollissima/splits_peptides/Castanea_mollissima_scaffolds_v3.2_cds.faa.$PBS_ARRAYID \
-f XML \
-d /lustre/haven/gamma/staton/projects/undergrads/castanea_mollissima/ips/xmls \
--disable-precalc \
--iprlookup \
--goterms \
--pathways \
--tempdir /lustre/haven/gamma/staton/projects/undergrads/castanea_mollissima/ips/tmp \
> /lustre/haven/gamma/staton/projects/undergrads/castanea_mollissima/ips/tmp/$PBS_ARRAYID.out
@mestato for this, should we clear the old feature records from chado before we add the new ones?
Yes, but we need to add the old gene names as synonyms to the new gene names. so anyone searching for them can find the most up to date version. I'm working on this analysis now
Update: hold off on making files downloadable until publication is out. Posted ft.lauderdale agreement to this effect in the description.
Publication and Data Information
We have a new version of the Chinese chestnut genome (so not actually a new organism, just a new reference genome with new annotation and new features).
This reference genome has two versions (v3.2 and v4.1). We will upload features from version 3.2 but have downloadable files from both. I am still working with @MattHuff on getting 4.1 files, so lets start with the 3.2 version files for now.
Files are on the staton server in /staton/projects/chestnut/psudochro:
3.2:
Castanea_mollissima_scaffolds_v3.2_cds.faa Castanea_mollissima_scaffolds_v3.2_cds.fna Castanea_mollissima_scaffolds_v3.2.fasta Castanea_mollissima_scaffolds_v3.2.gff
We should have a pub submitted soon, will try to remember to add that info here, but ask me again soon if I don't.
Additional Information
Checklist
See New Genome Documentation for detailed instructions.