statonlab / hardwoods_site

Hardwoods Genomics bugs, data loading, and general issues
GNU General Public License v3.0
2 stars 1 forks source link

Pink Ipe Genome #130

Closed mestato closed 6 years ago

mestato commented 6 years ago

Handroanthus impetiginosus, Bignoniaceae Genome

bradfordcondon commented 6 years ago

@lupercal2

bradfordcondon commented 6 years ago

https://academic.oup.com/gigascience/article/7/1/1/4739364

bradfordcondon commented 6 years ago

Stuff we want from the paper

Stuff we annotate ourselves

Lupercal2 commented 6 years ago

http://gigadb.org/dataset/view/id/100379

Link to GIGA[db] with files for the paper that include Sequence Assembly, Feature Annotations, mRNA seq, polypeptide sequences, plus a few other files as well.

Lupercal2 commented 6 years ago

Completed

Uploaded mRNA & polypeptide. (also have the gff file and the genome assembly file)

In Progress

Waiting on the galaxy jobs to finishing running for the BLASTx for TreMBL and Swiss-Prot.

almasaeed2010 commented 6 years ago

@Lupercal2 when you loaded the polypeptides, did you specify that those are derived from mRNAs? Even if there is no regexp needed, you still have to select that option from the advanced options menu.

You might need to reload the polypeptides to verify that it works

almasaeed2010 commented 6 years ago

Here is what you'd be looking for

screen shot 2018-04-02 at 3 58 02 pm

bradfordcondon commented 6 years ago

right so you fill out the type (mRNA) and the relationship type but not the regexp box.

almasaeed2010 commented 6 years ago

So far the mRNA and polypeptides have been loaded successfully.

Current problems:

  1. Interpro scans are not appearing despite doing it multiple times with and without different types of Regex.
  2. Blast has the same issue as interpro

We could look into the feature_cvterm table to check whether the annotations have been identified but not appearing on the page.

I moved all relevant files (interpro, blast, fastas) to sites/default/files/sequences/pinkIpe on the main dev machine in case they are needed.

Lupercal2 commented 6 years ago

Yes i made sure to verify they were derived from mRNAs

bradfordcondon commented 6 years ago

i had to use this regexp to load in hte polypeptides to match the mRNA

>(.*?) (note white space). Nonsense because the names are the same in hte mRNA and polypeptide, but it requires a regexp...

bradfordcondon commented 6 years ago

because of a problem with the file server, i copied the pink ipe blast annotation file to /sites/default/files/sequences/pinkIpe/pinkIpe.blast.xml on live. We should plan on deleting before closing otu this issue.

Lupercal2 commented 6 years ago

the pink ipe is completely on live.

bradfordcondon commented 6 years ago

Thanks lucas! Please remember to link to what you added so its easier for us to review.

In this case:

The last one is my fault: the organism linker field is not enabled for the blast/interpro analysis content type. I'll enable it: as you fix the above issues, please link the analyses to the pink ipe organism.

bradfordcondon commented 6 years ago

heres what to look for on the "edit" page.

screen shot 2018-06-25 at 9 33 14 am

In addition, i've added the publication for this genome with the below command referencing its pubmed ID

drush trp-import-pubs --dbxref=PMID:29253216 --username=hwadmin The content page for it is here: https://www.hardwoodgenomics.org/Publication/2209438

Can you link the genome analysis to this publication? You do this while editing the analysis. You have to search for the title (Genome assembly of the Pink Ipê (Handroanthus impetiginosus, Bignoniaceae), a highly valued, ecologically keystone Neotropical timber forest tree.) to link it, like so:

screen shot 2018-06-25 at 9 53 17 am
Lupercal2 commented 6 years ago

fixed this typo here https://hardwoodgenomics.org/InterProScan-annotation/2209436. the other duplicate interpro analysis has been deleted. and i linked the publication to the genome analysis here :https://hardwoodgenomics.org/Genome-assembly/2161592?tripal_pane=group_publications

mestato commented 6 years ago

For interproscan annotation, why do the terms "ends during" and "regulates" show up? those are probably relationship terms. Is this a larger issue that needs to be put on the cv xray page?

Still need to add Pink Ipe to genome page (https://hardwoodgenomics.org/genomes) and to jbrowse page (https://hardwoodgenomics.org/content/jbrowse). Make sure genes get indexed and shows up in gene search tool, too (https://hardwoodgenomics.org/gene-search), too.

almasaeed2010 commented 6 years ago
bradfordcondon commented 6 years ago

For the Genomes page. Are those all Genome Assembly analyses? If so, we should probably just create a view instead of manually adding entries.

Not sure... The first problematic example i can think of is the just added English walnut alignments (chinese wingnut etc) which were added as a genome assembly content type, but i dont think we'd want it showing up here (could be wrong).

Do we have JBrowse instance for Pink Ipe?

We have nothing for pink ipe jbrowse. Fortunately @Lupercal2 said he has the GFF handy above. I guess I'll work him to add that?

For interproscan annotation, why do the terms "ends during" and "regulates" show up?

You mean for the ontology browser attached to the pink ipe organism?

screen shot 2018-06-25 at 12 08 30 pm

The answer is, GO defines those two relationship terms, so they show up there. we can petition Stephen to either remove relationship terms from the cv browser view, or specifically blacklist these ones.

mestato commented 6 years ago

If we don't have a jbrowse, lets get one up. I think a JBrowse is one of the main useful things we can provide, and unless we have speed issues, there's no reason not to at least have a gene track. We probably need to develop a checklist for what needs to be done for a genome analysis to be added to the site

On Mon, Jun 25, 2018 at 12:01 PM Abdullah Almsaeed notifications@github.com wrote:

-

For the Genomes page. Are those all Genome Assembly analyses? If so, we should probably just create a view instead of manually adding entries.

Do we have JBrowse instance for Pink Ipe?

The gene index will need to be repopulated every time we add new genes that have annotations. Unfortunately there is no easy way around that. However, the entities with their annotations are automatically indexed and should show up using the main search.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/statonlab/hardwoods_site/issues/130#issuecomment-400004836, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfA2h1OxpDK7WnztozGIDl7xTbyvhMyks5uAQlegaJpZM4RyaiO .

-- Margaret Staton Assistant Professor Department of Entomology and Plant Pathology 370 PBB, 2505 EJ Chapman Drive Knoxville, TN 37996-4560

864-506-4515 Mobile mstaton1@utk.edu

Lupercal2 commented 6 years ago

I thought Abdullah made the Jbrowse tract already? @almasaeed2010

bradfordcondon commented 6 years ago

ah he did indeed on dev. Can we please copy live? Need to figure out the linker url.

almasaeed2010 commented 6 years ago

I completely forgot about that. I am transferring the data up to the live site. Thanks for the reminder @Lupercal2

almasaeed2010 commented 6 years ago

Ok JBrowse is up and linked to on live: https://hardwoodgenomics.org/content/jbrowse

bradfordcondon commented 6 years ago

current issues with Pink Ipe on live site. @Lupercal2 please address and notify us here when you are finished.

Once we're happy with the final results we can close issue and also make a celebratory tweet from the hWG account.

Thanks.

Lupercal2 commented 6 years ago

and the duplicate analysis to be deleted is this one https://hardwoodgenomics.org/bio_data/2161597

bradfordcondon commented 6 years ago

@Lupercal2 you linked the genome assembly to H. macrophylla not H. impetiginosus

Lupercal2 commented 6 years ago

fixed https://www.hardwoodgenomics.org/Genome-assembly/2161592

bradfordcondon commented 6 years ago

ok thanks lucas. All looks presentable, im announcing its availability and closing the issue. thanks!

bradfordcondon commented 6 years ago

below tweet scheduled for 9:30 am on thursday so time to change if theres an error

Hardwood Genomics now has resources online for Pink Ipê (handroaunthus impetiginosus). Did you know it is the most logged species in Brazil?

Tools include annotations, #jbrowse, and #blast

https://www.hardwoodgenomics.org/organism/Handroanthus/impetiginosus

Original publication for the data: https://doi.org/10.1093/gigascience/gix125

CaseyRichards92 commented 5 years ago

Swissprot


#PBS -t 1-200
#PBS -l nodes=1:ppn=2
#PBS -l walltime=04:00:00

cd $PBS_O_WORKDIR

module load blast

blastx \
 -query /lustre/haven/gamma/staton/projects/undergrads/pinkipe/cds_splits/Himpetiginosus.gene.cds.fasta.$PBS_ARRAYID \
 -db /lustre/haven/gamma/staton/library/uniprot/uniprot_sprot.fasta \
 -out /lustre/haven/gamma/staton/projects/undergrads/pinkipe/swissprot/pink_ipe_swissprot_$PBS_ARRAYID.xml \
 -evalue 1e-5 \
 -outfmt 5```
CaseyRichards92 commented 5 years ago

Swiss -Prot on live site completed https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/622913