statonlab / hardwoods_site

Hardwoods Genomics bugs, data loading, and general issues
GNU General Public License v3.0
2 stars 1 forks source link

European Beech genome #298

Open mestato opened 6 years ago

mestato commented 6 years ago

Reference genome for Fagus sylvatica: http://thines-lab.senckenberg.de/beechgenome/index2.htm

Reference manuscript: https://academic.oup.com/gigascience/article/7/6/giy063/5017772

almasaeed2010 commented 5 years ago

Files: http://thines-lab.senckenberg.de/beechgenome/data.html

RaymondS1 commented 5 years ago

http://160.36.205.61:9090/bio_data/418 Link to organism

RaymondS1 commented 5 years ago

http://160.36.205.61:9090/bio_data/419 Analysis of organism

RaymondS1 commented 5 years ago

http://160.36.205.61:9090/admin/tripal/tripal_jobs/view/24344

almasaeed2010 commented 5 years ago

Regular expression for proteins:

>(FS[0-9a-zA-Z]*)

RaymondS1 commented 5 years ago

http://160.36.205.61:9090/admin/tripal/tripal_jobs/view/24357

RaymondS1 commented 5 years ago

http://160.36.205.61:9090/admin/tripal/tripal_jobs/view/24384 Link to publishing job

almasaeed2010 commented 5 years ago

Path to CDS:

/home/www/sites/default/files/sequences/european_beech/Fagus_sylvatica_cds_v1.3.fasta

Path to Proteins:

/home/www/sites/default/files/sequences/european_beech/Fagus_sylvatica_prot_v1.3.fasta

RaymondS1 commented 5 years ago

https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/672175 Link to CDS

almasaeed2010 commented 5 years ago

@RaymondS1 you are ready to publish your genes.

RaymondS1 commented 5 years ago

https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/677307 Link to protein job

RaymondS1 commented 5 years ago

https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/677336 published gene records

RaymondS1 commented 5 years ago

Swissprot blast for european beech

#PBS -N swissprot_blast
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -t 1-200
#PBS -l nodes=1:ppn=2
#PBS -l walltime=02:00:00

cd $PBS_O_WORKDIR

module load blast

blastx \
 -query /lustre/haven/gamma/staton/projects/undergrads/european_beech/splits/Fagus_sylvatica_cds_v1.3.fasta.$PBS_ARRAYID \
 -db /lustre/haven/gamma/staton/library/uniprot/uniprot_sprot.fasta \
 -out /lustre/haven/gamma/staton/projects/undergrads/european_beech/blast/swissprot/european_beech_sprot_$PBS_ARRAYID.xml \
 -evalue 1e-5 \
 -outfmt 5

Trembl Blast

#PBS -N trembl_blast
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -t 1-200
#PBS -l nodes=1:ppn=2
#PBS -l walltime=04:00:00

cd $PBS_O_WORKDIR

module load blast

blastx \
 -query /lustre/haven/gamma/staton/projects/undergrads/european_beech/splits/Fagus_sylvatica_cds_v1.3.fasta.$PBS_ARRAYID \
 -db /lustre/haven/gamma/staton/library/uniprot/uniprot_trembl_plants_July_2018.fasta \
 -out /lustre/haven/gamma/staton/projects/undergrads/european_beech/blast/trembl/european_beech_trembl_$PBS_ARRAYID.xml
\
 -evalue 1e-5 \
 -outfmt 5
RaymondS1 commented 5 years ago

Next Steps:

RaymondS1 commented 5 years ago

https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/710315 New gene fasta

RaymondS1 commented 5 years ago

https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/710317 New protein fasta. Correction

RaymondS1 commented 5 years ago

Gene/ CDS

>FSB010000101
ATGGCACCAACTATGTATATTGTCTATCTTCGCTTCAATGGGGAGATTATTTATGGTCAACATGGAGCTG
AGTATCAAGGGTCGCAAATGAAGTTCATCCGGGTTCATCGTGGGATTAGTTTTGTCGAATTGGAAACGAA
GATATTCAATGCACTACAATTGGACAATCAATCTCATCGTATAACAGTTACATACCGTTGTCCTCAGGAG
GTGATTTCACCTCACATTAATTACATGACTCTATTGATAACAGACGACGACGGTGTTAATCTCATGTTTG
ACATGTTAGATGCAACGCCTGAATTAAAAGGTATTGAGTTATATATAAGTGTGGAGGATTGTGTTGGTGA
AGGTGTTGAGCCTCTTACACAAGATGATGGGGATGGATTAGTAGCGGAAGATTGTGTTGGTGAAGATGTA
CAACAAATGACTGTGCATGATACTGCTCCTTCGACACAACCCTCTACACTTGGAAGGTGTACACCACAAT
TACATGAGATACGAACATCGGTGGAGGATTGTGGTCCCAGCACTCGACATGAGTATGTTCCATACGAGGT
AAACCCTTTAGCTGGAGTGCATGATACGATGATGTTGGAATGTACTGCTGATGATGAAGAAGAAAACGCT

Protien

>FSB011771501 kinase chloroplastic-like|protein serine/threonine kinase activity;ATP binding;protein phosphorylation;serine family amino acid metabolic process
mgncldssakvdtaqsshatsgsgiskfssktsrssapssltiqtfseksnasslpnprsegeilsspnlksfsfnelkn
atrnfrpdsllgeggfgyvfkgwidehsfsaakpgsgmvvavkklksegfqghkewltevnylgqlhhpnlvkligycle
genrllvyefmpkgslenhlfrrgpqplswairikvatgaarglcflhdaksqviyrdfkasnilldaefnaklsdfgla
kagptgdrthvstqvmgthgyaapeyvatgrltaksdvysfgvvllellsgrravdktkvvieqnlvdwakpylgdkrkl
frimdtklegqypqkgaytaatlalqclsneakgrprmaevlatleqldnpknagrpsqseqqtvapvrkspmrphhspr
nltpgasplpayrqsprvr
>FSB011771601 probable serine threonine- kinase NAK|protein serine/threonine kinase activity;ATP binding;protein phosphorylation;serine family amino acid metabolic process
mkvnkkdellhayrldcfyysvlkaatkkfscknllgeggfgdvykgyisyctmtaarpgcgfavavkrqrktgeqgvhe
wlneltflaglnhpnvvkligycsegdqrilvykymiggsleahllkadvtelnwrrrinialgaarglyflhtrgrpvi

@almasaeed2010 what regex should I use?

almasaeed2010 commented 5 years ago

@RaymondS1 try this regular expression:

>(.*?)\w+
RaymondS1 commented 5 years ago

https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/730778 Swiss-Prot blast https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/730773 Trembl Blast

RaymondS1 commented 5 years ago

https://hardwoods.ag.utk.edu/BLAST-annotation/2360512 Blast Annotation

RaymondS1 commented 5 years ago

IPS

#PBS -N european_beech_ips
#PBS -A ACF-UTK0011
#PBS -S /bin/bash
#PBS -t 1-200
#PBS -j oe
#PBS -l nodes=1:ppn=4
#PBS -l walltime=3:30:00

cd $PBS_O_WORKDIR

/lustre/haven/gamma/staton/software/interproscan-5.28-67.0/interproscan.sh \
 -i /lustre/haven/gamma/staton/projects/undergrads/european_beech/raw_data/ipssplits/Fagus_sylvatica_prot_v1.3.fasta.$PBS_ARRAYID \
 -f XML \
 -d /lustre/haven/gamma/staton/projects/undergrads/european_beech/ips/xml \
 --disable-precalc \
 --iprlookup \
 --goterms \
 --pathways \
 --tempdir /lustre/haven/gamma/staton/projects/undergrads/european_beech/ips/tmp \
 >& /lustre/haven/gamma/staton/projects/undergrads/european_beech/ips/tmp/$PBS_ARRAYID.out
RaymondS1 commented 5 years ago

https://hardwoods.ag.utk.edu/InterProScan-annotation/2418507 Link to InterPro Scan Annotation

RaymondS1 commented 5 years ago

https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/718904 InterPro Scan file upload

RaymondS1 commented 5 years ago

@almasaeed2010 Delete the 'F. Sylvatica' analysis.

RaymondS1 commented 5 years ago

Task List

almasaeed2010 commented 5 years ago

Any updates on this Organism? I think it only needs a couple of things before it can go live:

RaymondS1 commented 5 years ago

Live Site https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/480523 mRNA publishing https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/486382 CDS Fasta Upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/480517 Protein Fasta https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/480506 Swissprot upload job link https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/486528 Trembl Upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/487173 Interproscan Upload

almasaeed2010 commented 5 years ago

@RaymondS1 I see jobs on the live site for this organism that are not posted here. Please post the links to every submitted job here so we can inspect errors easily.

Currently this organism is missing the following:

Let's try to get these done. If you have any questions or need help with any of this please let me know.

almasaeed2010 commented 5 years ago

@RaymondS1 so far so good! See above for updated list of completed tasks.

You are now only missing the following:

almasaeed2010 commented 5 years ago

@RaymondS1 Everything looks good for this organism except for download links to the mRNA, and Polypeptide files along with the GFF file if one exists. If you don't know how to do this, please let me know.

mestato commented 5 years ago

Three items:

CaseyRichards92 commented 5 years ago

Kegg https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/678813

patricksis commented 5 years ago

Genome and gff are available for Jbrowse

patricksis commented 5 years ago

@RaymondS1

Add cross reference tho organism page and you are good to close

CaseyRichards92 commented 4 years ago

@RaymondS1 I added the cross reference and this issue can now be closed https://www.hardwoodgenomics.org/organism/Fagus/sylvatica