statonlab / hardwoods_site

Hardwoods Genomics bugs, data loading, and general issues
GNU General Public License v3.0
2 stars 1 forks source link

Update BLAST dbs for BLAST tool #464

Closed mestato closed 5 years ago

mestato commented 5 years ago

We haven't been updating the BLAST server with new seqs as they are added to live site, need to go back and do this (and add to documentation to make sure it gets done as part of "normal" loading procedure).

For a genome, add genomic scaffolds, gene sequences as nucleotides and gene sequences as proteins.

CaseyRichards92 commented 5 years ago

Wiki on how to update blast dbs https://github.com/mestato/statonlabprivate/wiki/blast-database-creation:-reference-and-draft-genomes

CaseyRichards92 commented 5 years ago

Started here Start here

CaseyRichards92 commented 5 years ago

Wiki has been updated with appropriate steps to update blast db and tools https://github.com/mestato/statonlabprivate/wiki/How-to-load-a-new-organism-and-genome

Add features and whole genome to BLAST server

  1. Mkdir for your organism in the bdb directory. file path: /var/www/html/sites/default/files/bdb
  2. Copy your cds and protein files for your organism to the directory you just created.
  3. After copying cds files to the directory use this command for CDS files: makeblastdb -dbtype nucl -in Acc_all_models_cds.fasta ← Using your own CDS file.
  4. After copying protein files to the directory use this command for peptide files: makeblastdb -dbtype prot -in Acc_all_models_peptide.fasta ← Using your own protein file.
  5. You should see new .nhr .nin .nsq .phr .pin .psq files that were created from the commands above.
  6. You are now ready to make your Blast database on hardwoodsgenomics site. Go to Content->Add content->Blast database
  7. For CDS:
    • Human-readable Name for Blast database: Genus species (transcripts)
    • File Prefix including Full Path: /var/www/html/sites/default/files/bdb/kiwi/Acc_all_models_cds.fasta ← Using your own organism and CDS file.
    • Type of the blast database: "Nucleotide"
    • Save
  8. For protein:
    • Human-readable Name for Blast database: Genus species (peptides)
    • File Prefix including Full Path: /var/www/html/sites/default/files/bdb/kiwi/Acc_all_models_peptide.fasta ← Using your own organism and protein file.
    • Type of the blast database: "Protein"
    • Save
mestato commented 5 years ago

Missing genome scaffolds (nucleotide):

Some of these are missing transcripts too

mestato commented 5 years ago

Quercus robur has been requested so lets make that one the first priority

adevine4 commented 5 years ago

@mestato Quercus robur has been added to the live site

adevine4 commented 5 years ago

@mestato scaffolds have been added

almasaeed2010 commented 5 years ago

@patricksis @cricha59 I am assigning both of you this task. Here is a list of the organisms you need to go though. Edit my post and add your name next the organism you want to tackle and add a checkmark to the ones you finished:

Basically, you'll need to go through and check if the scaffolds, transcripts and peptides exist for each organism. If something is missing, you'll need to create the blast db for it then add it to the site.

If you need help finding the necessary files let me know. Generally, everything should be available in files/sequences.

patricksis commented 5 years ago

@almasaeed2010 for Handroanthus impetiginosus, there was no genome file. I found it and added it to the server, but the download links on the website for that organism is in full HTML and I'm not sure how to add it in properly.

almasaeed2010 commented 5 years ago

I'll add the link no worries.

update: Done.

patricksis commented 5 years ago

So for both Populus genus, I can only find a cds file for Populus deltoides, nothing for Populus trichocarpa. The files might be in the links below, but you can only view them if you have an account.

trichocarpa: https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Ptrichocarpa

deltoides: https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_PdeltoidesWV94_er

patricksis commented 5 years ago

@mestato Would you like Populus trichocarpa updated? A few features are missing such as blast and ips annotations and when searching for mRNA- polypeptide, I can't seem to find any results for BLAST.

mestato commented 5 years ago

Yes please @patricksis , update P. trichocarpa

CaseyRichards92 commented 5 years ago

@almasaeed2010 all 6 juglans species have been finished. For regia and nigra I used the transcript file instead of a cds because there wasnt one. Looks to have worked. https://www.hardwoodgenomics.org/blast/report/fQKmPKmQ

almasaeed2010 commented 5 years ago

Thanks @cricha59 and @patricksis for taking care of this.

Just to confirm, you added transcripts, peptides and scaffolds?

patricksis commented 5 years ago

@almasaeed2010 yes, besides Populus trichocarpa, which I'm working on, and Populus deltoides which I think needs to be updated anyways.

patricksis commented 5 years ago

Populus deltoides

(transcripts): https://www.hardwoodgenomics.org/content/populus-deltoides-transcripts (scaffolds): https://www.hardwoodgenomics.org/content/populus-deltoides-scaffolds (peptides): https://www.hardwoodgenomics.org/content/populus-deltoides-peptides

Closing issue, all bdb on the list have been completed