Closed mestato closed 6 years ago
@lupercal2
[ ] Sequence assembly
[ ] Feature annotations (GFF)
[ ] mRNA sequences
[ ] Polypeptide sequences
[ ] BLAST annotation of mRNA
[ ] IPR annotation of polypeptides
http://gigadb.org/dataset/view/id/100379
Link to GIGA[db] with files for the paper that include Sequence Assembly, Feature Annotations, mRNA seq, polypeptide sequences, plus a few other files as well.
Uploaded mRNA & polypeptide. (also have the gff file and the genome assembly file)
Waiting on the galaxy jobs to finishing running for the BLASTx for TreMBL and Swiss-Prot.
@Lupercal2 when you loaded the polypeptides, did you specify that those are derived from mRNAs? Even if there is no regexp needed, you still have to select that option from the advanced options menu.
You might need to reload the polypeptides to verify that it works
Here is what you'd be looking for
right so you fill out the type (mRNA) and the relationship type but not the regexp box.
So far the mRNA and polypeptides have been loaded successfully.
Current problems:
We could look into the feature_cvterm table to check whether the annotations have been identified but not appearing on the page.
I moved all relevant files (interpro, blast, fastas) to sites/default/files/sequences/pinkIpe
on the main dev machine in case they are needed.
Yes i made sure to verify they were derived from mRNAs
i had to use this regexp to load in hte polypeptides to match the mRNA
>(.*?)
(note white space). Nonsense because the names are the same in hte mRNA and polypeptide, but it requires a regexp...
because of a problem with the file server, i copied the pink ipe blast annotation file to /sites/default/files/sequences/pinkIpe/pinkIpe.blast.xml
on live. We should plan on deleting before closing otu this issue.
the pink ipe is completely on live.
Thanks lucas! Please remember to link to what you added so its easier for us to review.
In this case:
where was the base genome assembly analysis? edit: found it, here: https://hardwoodgenomics.org/Genome-assembly/2161592
Typos in program name: https://hardwoodgenomics.org/bio_data/2209435 nad https://hardwoodgenomics.org/bio_data/2209436
duplicate analyses? https://hardwoodgenomics.org/bio_data/2161599 and https://hardwoodgenomics.org/bio_data/2161597
organism is not linked ot analysis.
The last one is my fault: the organism linker field is not enabled for the blast/interpro analysis content type. I'll enable it: as you fix the above issues, please link the analyses to the pink ipe organism.
heres what to look for on the "edit" page.
In addition, i've added the publication for this genome with the below command referencing its pubmed ID
drush trp-import-pubs --dbxref=PMID:29253216 --username=hwadmin
The content page for it is here: https://www.hardwoodgenomics.org/Publication/2209438
Can you link the genome analysis to this publication? You do this while editing the analysis. You have to search for the title (Genome assembly of the Pink Ipê (Handroanthus impetiginosus, Bignoniaceae), a highly valued, ecologically keystone Neotropical timber forest tree.) to link it, like so:
fixed this typo here https://hardwoodgenomics.org/InterProScan-annotation/2209436. the other duplicate interpro analysis has been deleted. and i linked the publication to the genome analysis here :https://hardwoodgenomics.org/Genome-assembly/2161592?tripal_pane=group_publications
For interproscan annotation, why do the terms "ends during" and "regulates" show up? those are probably relationship terms. Is this a larger issue that needs to be put on the cv xray page?
Still need to add Pink Ipe to genome page (https://hardwoodgenomics.org/genomes) and to jbrowse page (https://hardwoodgenomics.org/content/jbrowse). Make sure genes get indexed and shows up in gene search tool, too (https://hardwoodgenomics.org/gene-search), too.
For the Genomes page. Are those all Genome Assembly analyses? If so, we should probably just create a view instead of manually adding entries.
Do we have JBrowse instance for Pink Ipe?
The gene index will need to be repopulated every time we add new genes that have annotations. Unfortunately there is no easy way around that. However, the entities with their annotations are automatically indexed and should show up using the main search.
For the Genomes page. Are those all Genome Assembly analyses? If so, we should probably just create a view instead of manually adding entries.
Not sure... The first problematic example i can think of is the just added English walnut alignments (chinese wingnut etc) which were added as a genome assembly content type, but i dont think we'd want it showing up here (could be wrong).
Do we have JBrowse instance for Pink Ipe?
We have nothing for pink ipe jbrowse. Fortunately @Lupercal2 said he has the GFF handy above. I guess I'll work him to add that?
For interproscan annotation, why do the terms "ends during" and "regulates" show up?
You mean for the ontology browser attached to the pink ipe organism?
The answer is, GO defines those two relationship terms, so they show up there. we can petition Stephen to either remove relationship terms from the cv browser view, or specifically blacklist these ones.
If we don't have a jbrowse, lets get one up. I think a JBrowse is one of the main useful things we can provide, and unless we have speed issues, there's no reason not to at least have a gene track. We probably need to develop a checklist for what needs to be done for a genome analysis to be added to the site
On Mon, Jun 25, 2018 at 12:01 PM Abdullah Almsaeed notifications@github.com wrote:
-
For the Genomes page. Are those all Genome Assembly analyses? If so, we should probably just create a view instead of manually adding entries.
Do we have JBrowse instance for Pink Ipe?
The gene index will need to be repopulated every time we add new genes that have annotations. Unfortunately there is no easy way around that. However, the entities with their annotations are automatically indexed and should show up using the main search.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/statonlab/hardwoods_site/issues/130#issuecomment-400004836, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfA2h1OxpDK7WnztozGIDl7xTbyvhMyks5uAQlegaJpZM4RyaiO .
-- Margaret Staton Assistant Professor Department of Entomology and Plant Pathology 370 PBB, 2505 EJ Chapman Drive Knoxville, TN 37996-4560
864-506-4515 Mobile mstaton1@utk.edu
I thought Abdullah made the Jbrowse tract already? @almasaeed2010
ah he did indeed on dev. Can we please copy live? Need to figure out the linker url.
I completely forgot about that. I am transferring the data up to the live site. Thanks for the reminder @Lupercal2
Ok JBrowse is up and linked to on live: https://hardwoodgenomics.org/content/jbrowse
current issues with Pink Ipe on live site. @Lupercal2 please address and notify us here when you are finished.
Once we're happy with the final results we can close issue and also make a celebratory tweet from the hWG account.
Thanks.
and the duplicate analysis to be deleted is this one https://hardwoodgenomics.org/bio_data/2161597
@Lupercal2 you linked the genome assembly to H. macrophylla not H. impetiginosus
ok thanks lucas. All looks presentable, im announcing its availability and closing the issue. thanks!
below tweet scheduled for 9:30 am on thursday so time to change if theres an error
Hardwood Genomics now has resources online for Pink Ipê (handroaunthus impetiginosus). Did you know it is the most logged species in Brazil?
Tools include annotations, #jbrowse, and #blast
https://www.hardwoodgenomics.org/organism/Handroanthus/impetiginosus
Original publication for the data: https://doi.org/10.1093/gigascience/gix125
Swissprot
#PBS -t 1-200
#PBS -l nodes=1:ppn=2
#PBS -l walltime=04:00:00
cd $PBS_O_WORKDIR
module load blast
blastx \
-query /lustre/haven/gamma/staton/projects/undergrads/pinkipe/cds_splits/Himpetiginosus.gene.cds.fasta.$PBS_ARRAYID \
-db /lustre/haven/gamma/staton/library/uniprot/uniprot_sprot.fasta \
-out /lustre/haven/gamma/staton/projects/undergrads/pinkipe/swissprot/pink_ipe_swissprot_$PBS_ARRAYID.xml \
-evalue 1e-5 \
-outfmt 5```
Swiss -Prot on live site completed https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/622913
Handroanthus impetiginosus, Bignoniaceae Genome
503.7 Mb (N50 = 81 316 bp), 90.4% of the 557-Mbp genome, with 13 206 scaffolds.
31 688 structures and 35 479 messenger RNA transcripts, while external evidence supported a well-curated set of 28 603 high-confidence models
[ ] deleted blast files on live to conserve space