statonlab / hardwoods_site

Hardwoods Genomics bugs, data loading, and general issues
GNU General Public License v3.0
2 stars 1 forks source link

Mulberry genome #340

Open mestato opened 6 years ago

mestato commented 6 years ago

Original paper: https://www.nature.com/articles/ncomms3445

DB: https://morus.swu.edu.cn/

almasaeed2010 commented 6 years ago

Downloadables can be found here: https://morus.swu.edu.cn/morusdb/datasets

patricksis commented 5 years ago

Organism upload: https://hardwoods.ag.utk.edu/organism/Morus/notabilis

patricksis commented 5 years ago

Genome assembly: https://hardwoods.ag.utk.edu/Genome-assembly/2360507

patricksis commented 5 years ago

cds upload job: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/696233

patricksis commented 5 years ago

protein upload job: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/700014 regular expression used: >(.*?) t protein job re-upload: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/700024 same reg expression used

patricksis commented 5 years ago

acf blast analysis:

#PBS -N swissprot_BLAST
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -t 1-200
#PBS -l nodes=1:ppn=2
#PBS -l walltime=04:00:00

cd $PBS_O_WORKDIR

module load blast

blastx \
 -query /lustre/haven/gamma/staton/projects/undergrads/mulberry/raw_data/BLAST-split/morus_notabilis.cds.fast$
 -db /lustre/haven/gamma/staton/library/uniprot/uniprot_sprot.fasta \
 -out /lustre/haven/gamma/staton/projects/undergrads/mulberry/BLAST/swissprot/mulberry_swissprot_$PBS_ARRAYID$
 -evalue 1e-5 \
 -outfmt 5
patricksis commented 5 years ago

Trembl job on acf

#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -t 1-200
#PBS -l nodes=1:ppn=2
#PBS -l walltime=07:00:00

cd $PBS_O_WORKDIR

module load blast

blastx \
 -query /lustre/haven/gamma/staton/projects/undergrads/mulberry/raw_data/BLAST-split/morus_notabilis.cds.fast$
 -db /lustre/haven/gamma/staton/libraries/uniprot/uniprot_trembl_plants_July_2018.fasta \
 -out /lustre/haven/gamma/staton/projects/undergrads/mulberry/BLAST/trembl/mulberry_trembl_$PBS_ARRAYID$
 -evalue 1e-5 \
 -outfmt 5
patricksis commented 5 years ago

https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/710261 Job page for InterPro scan

patricksis commented 5 years ago

Publish Gene Job: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/714578

patricksis commented 5 years ago

Blastx Annotation (trembl): https://hardwoods.ag.utk.edu/Analysis/2445478 Blastx Annotation (Swissprot): https://hardwoods.ag.utk.edu/BLAST-annotation/2445479 Organism page: https://hardwoods.ag.utk.edu/organism/Morus/notabilis Genome assembly: https://hardwoods.ag.utk.edu/Genome-assembly/2360507 InterProScan Annotation: https://hardwoods.ag.utk.edu/InterProScan-annotation/2418511 Publication: https://hardwoods.ag.utk.edu/Publication/2525405

patricksis commented 5 years ago

Trembl import job: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/718534

patricksis commented 5 years ago

SwissProt import job: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/728752

patricksis commented 5 years ago

Morus notabilis - BLASTx Annotation(SwissProt): https://hardwoods.ag.utk.edu/BLAST-annotation/2445479

patricksis commented 5 years ago

Fixed trembl import job: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/728967 reg expression used: (L.*.t\d\d)

patricksis commented 5 years ago

fixed swissprot import job: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/728972 reg expression used: (L.*.t\d\d)

patricksis commented 5 years ago

interProScan import job: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/730803

mestato commented 5 years ago

For the organism page, I made the links to the sources clickable. For the reference genome description, could you add links to download the files (fasta, gff, etc) and the publication associated with the analysis.

patricksis commented 5 years ago

will do

almasaeed2010 commented 5 years ago

@patricksis Everything looks good except for the publication. It's created but nor linked to the genome assembly. You need to edit the assembly and choose it under publications.

mestato commented 5 years ago

Updated ref genome analysis description to fix links.

patricksis commented 5 years ago

Main site Organism page: https://www.hardwoodgenomics.org/organism/Morus/notabilis Reference Genome: https://www.hardwoodgenomics.org/Genome-assembly/2335717 Publication: https://www.hardwoodgenomics.org/Publication/2335718 InterProScan Annotation: https://www.hardwoodgenomics.org/InterProScan-annotation/2335719 Trembl Annotation: https://www.hardwoodgenomics.org/BLAST-annotation/2335720 Swissprot Annotation: https://www.hardwoodgenomics.org/BLAST-annotation/2335721

patricksis commented 5 years ago

cds upload: https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/469971 protein upload: https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/469975

patricksis commented 5 years ago

Uploaded split protein file to blastKOALA. Pearl script used: perl /lustre/haven/gamma/staton/unpublished_lab_code/perl/fasta_scripts/split.pl morus_notabilis.protein.fasta 6

patricksis commented 5 years ago

IPS job (Live) https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/501451

patricksis commented 5 years ago

MRNA publishing job; https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/476534

patricksis commented 5 years ago

Live site swissprot upload: https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/501406

patricksis commented 5 years ago

live site trembl upload: https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/501388

patricksis commented 5 years ago

KEGG annotation: https://hardwoodgenomics.org/KEGGresults/2509829 KEGG job: https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/516855

patricksis commented 5 years ago

WD trp_kegg: ERROR: Unable to find KEGG term: 'K22943' [warning] @bradfordcondon is this an issue that needs to be addressed when uploading KEGG?

bradfordcondon commented 5 years ago

ugh yeah that is a problem on the back end not your end. 1 second

bradfordcondon commented 5 years ago

Short answer: that term was added to kegg after the OBO was built... solutions:

see #456 for issue discussion.

patricksis commented 5 years ago

Dev site: BLAST db for cds: https://hardwoods.ag.utk.edu/content/morus-notabilis-transcripts For peptides: https://hardwoods.ag.utk.edu/content/morus-notabilis-peptides

patricksis commented 5 years ago

Live site: BLAST DB for cds https://www.hardwoodgenomics.org/content/morus-notabilis-transcripts For peptides: https://www.hardwoodgenomics.org/content/morus-notabilis-peptides

almasaeed2010 commented 5 years ago

Looks like the polypeptides were loaded incorrectly in that it was indicated that this polypeptide is "part of" the mrna instead of "derived from." Therefore, we need to reload the peptides but the good news is that we don't have to reload anything else. @patricksis please talk to me about this when you are here.

select * from chado.feature_relationship where subject_id = '5168696';
 feature_relationship_id | subject_id | object_id | type_id | value | rank 
-------------------------+------------+-----------+---------+-------+------
                 2314871 |    5168696 |   5141731 |      73 |       |    0

Note that the subject and object are flipped (5141731 should be the subject).

patricksis commented 5 years ago

Peptides re-upload: https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/719162

mestato commented 5 years ago

Looks like the gff is available through ncbi (https://www.ncbi.nlm.nih.gov/genome/?term=txid981085[Organism:noexp]). @patricksis could you grab it and add to downloads and use for JBrowse?

patricksis commented 5 years ago

JBrowse instance: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/548611

patricksis commented 5 years ago

Jbrowse instance for live: https://www.hardwoodgenomics.org/admin/tripal/extension/tripal_jbrowse/management/instances/10

patricksis commented 5 years ago

gff track is not showing up here. I don't think the names from the genome and gff match up

mestato commented 5 years ago

@mestato work on fasta and gff files, see server location: /var/www/html/sites/default/files/sequences/mulberry/raw_data

CaseyRichards92 commented 5 years ago

@patricksis Here is what this organism needs.

patricksis commented 5 years ago

@cricha59 this is a known issue with KEGG, we have an issue #456. KEGG does it for every organism I'm pretty sure.

patricksis commented 5 years ago

Cross references have been added, the only thing left for the organism is to fix JBrowse.