Closed mestato closed 4 years ago
Couldn't find any peptide files http://hazelnut.data.mocklerlab.org/
Dev Site Organism page: https://hardwoods.ag.utk.edu/organism/Corylus/avellana Publication: https://hardwoods.ag.utk.edu/Publication/2803350 Reference Genome: https://hardwoods.ag.utk.edu/Genome-assembly/2803351 InterProScan annotation: https://hardwoods.ag.utk.edu/InterProScan-annotation/2803352 SwissProt annotation: https://hardwoods.ag.utk.edu/BLAST-annotation/2803353 Trembl annotation: https://hardwoods.ag.utk.edu/BLAST-annotation/2803354 Loader job for CDS: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/534625
Most likely need a regular expression for the peptide file, ex >Corav.1 -3 834 2144
, were the cds has nothing >Corav.1
Peptides loader job: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/505954
Regex used >(Corav\.\d+)?
Publish tribal content: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/505955
The webpage currently hosting the gff file is down for some reason, will check back later
ACF trembl
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -t 1-200
#PBS -l nodes=1:ppn=2
#PBS -l walltime=12:00:00
cd $PBS_O_WORKDIR
module load blast
blastx \
-query /lustre/haven/gamma/staton/projects/undergrads/hazelnut/raw_data/BLAST_split/C_avellana_cds.fasta.#PBS_ARRAYID \
-db /lustre/haven/gamma/staton/library/uniprot/uniprot_trembl_plants_July_2018.fasta \
-out /lustre/haven/gamma/staton/projects/undergrads/hazelnut/BLAST/trembl/C_avellana_trembl_$PBS_ARRAYID.xml \
-evalue 1e-5 \
-outfmt 5 ```
Swissprot
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -t 1-200
#PBS -l nodes=1:ppn=2
#PBS -l walltime=05:00:00
cd $PBS_O_WORKDIR
module load blast
blastx \
-query /lustre/haven/gamma/staton/projects/undergrads/hazelnut/raw_data/BLAST_split/C_avellana_cds.fasta.$PBS_ARRAYID \
-db /lustre/haven/gamma/staton/library/uniprot/uniprot_sprot.fasta \
-out /lustre/haven/gamma/staton/projects/undergrads/hazelnut/BLAST/swissprot/C_avellana_sprot.$PBS_ARRAYID.xml \
-evalue 1e-5 \
-outfmt 5
Ips
#PBS -A ACF-UTK0011
#PBS -S /bin/bash
#PBS -t 1-200
#PBS -j oe
#PBS -l nodes=1:ppn=4
#PBS -l walltime=4:30:00
cd $PBS_O_WORKDIR
/lustre/haven/gamma/staton/software/interproscan-5.28-67.0/interproscan.sh \
-i /lustre/haven/gamma/staton/projects/undergrads/hazelnut/raw_data/ips_split/C_avellana_peptides.fasta.$PBS_ARRAYID \
-f XML \
-d /lustre/haven/gamma/staton/projects/undergrads/hazelnut/ips/xmls \
--disable-precalc \
--iprlookup \
--goterms \
--pathways \
--tempdir /lustre/haven/gamma/staton/projects/undergrads/hazelnut/ips/TMP \
> /lustre/haven/gamma/staton/projects/undergrads/hazelnut/ips/TMP/$PBS_ARRAYID.out
Trembl xml import job: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/507088
Swissprot xml import job: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/507089
@almasaeed2010 I can't seem to get the IPS results to show up, could you take a look at it for me when you have the time?
@patricksis I cleaned up the polypeptides file for you so I think it should work now if you rerun IPS on ACF. Here is the file you should use on the dev server:
/var/www/html/sites/default/files/sequences/hazelnut/raw_files/C_avellana_peptides_matt.clean.fasta
@almasaeed2010 thank you
The deletion job is running here: https://hardwoods.ag.utk.edu/admin/tripal/tripal_jobs/view/534619
Once the job is done, you can start re-submitting the data
@mestato The webpage hosting the files has been down for weeks, I need access to the original polypeptide file as well as the GFF file, I attempted to contact them, but sadly I haven't received a response. Website: https://www.cavellanagenomeportal.com/.
@almasaeed2010 The page hosting the peptide file is back up! The original peptide file is located here:
/var/www/html/sites/default/files/sequences/hazelnut/raw_files/C.avellana_transcriptome_Jefferson_ORFs_final.fasta
Here is the error we get when loading the peptides:
Cannot find a unique feature for the parent 'Corav.471' of type 'mRNA' for the feature.
[site http://default] [TRIPAL ERROR] [TRIPAL_JOB] Cannot find a unique feature for the parent 'Corav.471' of type 'mRNA' for the feature.
Cannot find a unique feature for the parent 'Corav.471' of type 'mRNA' for the feature.
[site http://default] [TRIPAL ERROR] [TRIPAL_JOB] Cannot find a unique feature for the parent 'Corav.471' of type 'mRNA' for the feature.
Cannot find a unique feature for the parent 'Corav.1774' of type 'mRNA' for the feature.
[site http://default] [TRIPAL ERROR] [TRIPAL_JOB] Cannot find a unique feature for the parent 'Corav.1774' of type 'mRNA' for the feature.
I think the CDS file is malformed.
Here is what shows up when we grep 471
in the CDS file.
$ cat C_avellana_cds.fasta | grep "\.471"
caccagctctgcaagaacccaaggcc.471
>Corav.4710
>Corav.4711
>Corav.4712
>Corav.4713
>Corav.4714
>Corav.4715
>Corav.4716
>Corav.4717
>Corav.4718
>Corav.4719
Notice the first line where a sequence ends with the numbers!
Trying the same on the peptides file:
$ cat C.avellana_transcriptome_Jefferson_ORFs_final.fasta | grep "\.471"
>Corav.471 -1 148 762
>Corav.4710 -1 1 87
>Corav.4711 -2 2 175
>Corav.4712 -3 39 431
>Corav.4713 +1 295 1287
>Corav.4714 -1 13 1230
>Corav.4715 +2 149 1225
>Corav.4716 -1 1 261
>Corav.4717 -2 407 568
>Corav.4718 -1 10 243
>Corav.4719 -3 117 1013
In the peptides file, we find the feature as expected.
I think I might have found a working file @almasaeed2010
$ cat C.avellana_transcriptome_Jefferson_CDS.fasta | grep "\.471"
>Corav.471
>Corav.4710
>Corav.4711
>Corav.4712
>Corav.4713
>Corav.4714
>Corav.4715
>Corav.4716
>Corav.4717
>Corav.4718
>Corav.4719
I think this may have been the original file we used, but we may have edited it.
looks fixed to me 👍
Try reloading it.
Going to try and load this organism to the live site using a different cds file.
Organsim: https://www.hardwoodgenomics.org/organism/Corylus/avellana Publication: https://www.hardwoodgenomics.org/Publication/3472009 Reference Genome: https://www.hardwoodgenomics.org/Genome-assembly/3472010 InterProScan annotation: https://www.hardwoodgenomics.org/InterProScan-annotation/3472011 SwissProt annotation: https://www.hardwoodgenomics.org/BLAST-annotation/3472012 Trembl annotation: https://www.hardwoodgenomics.org/BLAST-annotation/3472013 Chado cds loader: https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/800435 Chado peptide loader: https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/800439 Publish tripal content: https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/800446 KEGG annotation: https://www.hardwoodgenomics.org/KEGGresults/3500181 KEGG loader: https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/800696 Trembl loader: https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/806726 Swissprot loader: https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/806730 IPS loader: https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/806731
blast db (cds): https://www.hardwoodgenomics.org/content/corylus-avellana-transcripts blast db (peptides): https://www.hardwoodgenomics.org/content/corylus-avellana-peptides
No Gene Ontology/KEGG browser on organism page
@almasaeed2010 Any reason that Gene Ontology/KEGG browser not showing up?
Probably needs reindexing. I'll run it now. May take a few hours though.
Thanks, It didn't look like any problem with cds/peptide files to me, so I wasn't too sure.
JBrowse instance: https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/807913
JBrowse link has also been added. This organism should just be about done.
blast db (scaffolds): https://www.hardwoodgenomics.org/content/corylus-avellana-scaffolds
@cricha59 @RaymondS1 Can either of you go over this so I can close the issue.
@patricksis only took me 6 months but everything is there. You can close.
Lol thanks @cricha59. Closing.
Publication and Data Information
https://www.biorxiv.org/content/10.1101/469015v1
Additional Information
Checklist
See New Genome Documentation for detailed instructions.