Open mestato opened 6 years ago
Downloadables can be found here: https://morus.swu.edu.cn/morusdb/datasets
Organism upload: https://hardwoods.ag.utk.edu/organism/Morus/notabilis
Genome assembly: https://hardwoods.ag.utk.edu/Genome-assembly/2360507
protein upload job:
https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/700014
regular expression used: >(.*?) t
protein job re-upload:
https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/700024
same reg expression used
acf blast analysis:
#PBS -N swissprot_BLAST
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -t 1-200
#PBS -l nodes=1:ppn=2
#PBS -l walltime=04:00:00
cd $PBS_O_WORKDIR
module load blast
blastx \
-query /lustre/haven/gamma/staton/projects/undergrads/mulberry/raw_data/BLAST-split/morus_notabilis.cds.fast$
-db /lustre/haven/gamma/staton/library/uniprot/uniprot_sprot.fasta \
-out /lustre/haven/gamma/staton/projects/undergrads/mulberry/BLAST/swissprot/mulberry_swissprot_$PBS_ARRAYID$
-evalue 1e-5 \
-outfmt 5
Trembl job on acf
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -t 1-200
#PBS -l nodes=1:ppn=2
#PBS -l walltime=07:00:00
cd $PBS_O_WORKDIR
module load blast
blastx \
-query /lustre/haven/gamma/staton/projects/undergrads/mulberry/raw_data/BLAST-split/morus_notabilis.cds.fast$
-db /lustre/haven/gamma/staton/libraries/uniprot/uniprot_trembl_plants_July_2018.fasta \
-out /lustre/haven/gamma/staton/projects/undergrads/mulberry/BLAST/trembl/mulberry_trembl_$PBS_ARRAYID$
-evalue 1e-5 \
-outfmt 5
https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/710261 Job page for InterPro scan
Publish Gene Job: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/714578
Blastx Annotation (trembl): https://hardwoods.ag.utk.edu/Analysis/2445478 Blastx Annotation (Swissprot): https://hardwoods.ag.utk.edu/BLAST-annotation/2445479 Organism page: https://hardwoods.ag.utk.edu/organism/Morus/notabilis Genome assembly: https://hardwoods.ag.utk.edu/Genome-assembly/2360507 InterProScan Annotation: https://hardwoods.ag.utk.edu/InterProScan-annotation/2418511 Publication: https://hardwoods.ag.utk.edu/Publication/2525405
Trembl import job: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/718534
SwissProt import job: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/728752
Morus notabilis - BLASTx Annotation(SwissProt): https://hardwoods.ag.utk.edu/BLAST-annotation/2445479
Fixed trembl import job: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/728967 reg expression used: (L.*.t\d\d)
fixed swissprot import job: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/728972 reg expression used: (L.*.t\d\d)
interProScan import job: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/730803
For the organism page, I made the links to the sources clickable. For the reference genome description, could you add links to download the files (fasta, gff, etc) and the publication associated with the analysis.
will do
@patricksis Everything looks good except for the publication. It's created but nor linked to the genome assembly. You need to edit the assembly and choose it under publications.
Updated ref genome analysis description to fix links.
Main site Organism page: https://www.hardwoodgenomics.org/organism/Morus/notabilis Reference Genome: https://www.hardwoodgenomics.org/Genome-assembly/2335717 Publication: https://www.hardwoodgenomics.org/Publication/2335718 InterProScan Annotation: https://www.hardwoodgenomics.org/InterProScan-annotation/2335719 Trembl Annotation: https://www.hardwoodgenomics.org/BLAST-annotation/2335720 Swissprot Annotation: https://www.hardwoodgenomics.org/BLAST-annotation/2335721
Uploaded split protein file to blastKOALA.
Pearl script used:
perl /lustre/haven/gamma/staton/unpublished_lab_code/perl/fasta_scripts/split.pl morus_notabilis.protein.fasta 6
MRNA publishing job; https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/476534
Live site swissprot upload: https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/501406
live site trembl upload: https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/501388
WD trp_kegg: ERROR: Unable to find KEGG term: 'K22943' [warning]
@bradfordcondon is this an issue that needs to be addressed when uploading KEGG?
ugh yeah that is a problem on the back end not your end. 1 second
Short answer: that term was added to kegg after the OBO was built... solutions:
see #456 for issue discussion.
Dev site: BLAST db for cds: https://hardwoods.ag.utk.edu/content/morus-notabilis-transcripts For peptides: https://hardwoods.ag.utk.edu/content/morus-notabilis-peptides
Live site: BLAST DB for cds https://www.hardwoodgenomics.org/content/morus-notabilis-transcripts For peptides: https://www.hardwoodgenomics.org/content/morus-notabilis-peptides
Looks like the polypeptides were loaded incorrectly in that it was indicated that this polypeptide is "part of" the mrna instead of "derived from." Therefore, we need to reload the peptides but the good news is that we don't have to reload anything else. @patricksis please talk to me about this when you are here.
select * from chado.feature_relationship where subject_id = '5168696';
feature_relationship_id | subject_id | object_id | type_id | value | rank
-------------------------+------------+-----------+---------+-------+------
2314871 | 5168696 | 5141731 | 73 | | 0
Note that the subject and object are flipped (5141731 should be the subject).
Peptides re-upload: https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/719162
Looks like the gff is available through ncbi (https://www.ncbi.nlm.nih.gov/genome/?term=txid981085[Organism:noexp]). @patricksis could you grab it and add to downloads and use for JBrowse?
JBrowse instance: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/548611
Jbrowse instance for live: https://www.hardwoodgenomics.org/admin/tripal/extension/tripal_jbrowse/management/instances/10
gff track is not showing up here. I don't think the names from the genome and gff match up
@mestato work on fasta and gff files, see server location: /var/www/html/sites/default/files/sequences/mulberry/raw_data
@patricksis Here is what this organism needs.
Unable to identify the feature for 'L484_001038.p01'
https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/516855@cricha59 this is a known issue with KEGG, we have an issue #456. KEGG does it for every organism I'm pretty sure.
Cross references have been added, the only thing left for the organism is to fix JBrowse.
Original paper: https://www.nature.com/articles/ncomms3445
DB: https://morus.swu.edu.cn/