statonlab / hardwoods_site

Hardwoods Genomics bugs, data loading, and general issues
GNU General Public License v3.0
2 stars 1 forks source link

Crepe myrtle (Lagerstroemia spp.) transcriptomes #482

Open mestato opened 5 years ago

mestato commented 5 years ago

Publication and Data Information

A number of pubs can be found, so lets start with NCBI (4 SRA RNA datasets available, all L. indica).

Additional Information

Checklist

See New Genome Documentation for detailed instructions.

MattHuff commented 5 years ago

Current workflow for assembling the transcripts is as follows:

  1. Use the SRA Toolkit command prefetch to obtain SRA files from NCBI using SRR IDs.
  2. Move files from localhome NCBI directory to /staton/projects/undergrads/crepe_myrtle/raw_transcripts, and run the SRA Toolkit command fastq-dump to convert SRA files to fastq. This needs to be run using specific options in order for Trinity to work.
  3. Run fastQC to identify adapter content of the reads, and then run trimmomatic to remove this content.
  4. Run Rcorrector on the trimmed reads to correct any errors. This can take a while, depending on how many files are available and their file size.
  5. Run trinity to assemble the files. If you run fastq-dump with default settings, trinity will reject it due to the fastq headers containing information it can't parse.
RaymondS1 commented 5 years ago

Fastqc

#PBS -N 1_fastqc
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -l nodes=1
#PBS -l walltime=00:10:00

cd $PBS_O_WORKDIR

module load fastqc

for f in /lustre/haven/gamma/staton/projects/undergrads/crepe_myrtle/raw_transcripts/*.fastq
do
        filename=$(basename "$f")
        base="${filename%%.fastq*}"
        echo "filename $filename base $base"
        mkdir $base.fastqc

        fastqc -o $base.fastqc $f >& $base.fastqc.out
done

wait
RaymondS1 commented 5 years ago

Trimming

#PBS -N 2_trimming
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -l nodes=1:ppn=2
#PBS -l walltime=00:30:00

cd $PBS_O_WORKDIR

module load java

for F in /lustre/haven/gamma/staton/projects/undergrads/crepe_myrtle/raw_transcripts/*.fastq
do
        BASE=$( basename $F | sed 's/.fastq*//g')
        echo "F $F"
        echo "base $BASE"

        java -jar /lustre/haven/gamma/staton/software/Trimmomatic-0.36/trimmomatic-0.36.jar SE -phred33 $F $BASE.trim.fastq ILLUMINACLIP:/lustre/haven/gamma/staton/software/Trimmomatic-0.36/adapters/adapters.fa:2:40:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

done 

wait
RaymondS1 commented 5 years ago

Organism https://hardwoodgenomics.org/organism/Lagerstroemia/indica

Publication https://hardwoodgenomics.org/Publication/3423189

Swissprot annotation https://hardwoodgenomics.org/BLAST-annotation/3423191

Trembl annotation https://hardwoodgenomics.org/BLAST-annotation/3423192

IntreProScan Annotation https://hardwoodgenomics.org/InterProScan-annotation/3423193

KEGG Annotation https://hardwoodgenomics.org/KEGGresults/3471998?tripal_pane=gp_sbo__relationship11885

RaymondS1 commented 5 years ago

CDS upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/790035

Polypeptide upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/791263

Published https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/790040

Gff3 upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/790846

Swissprot upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/792003

Trembl upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/790840

IPS upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/791626

KEGG upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/791270

Blast DB CDS https://hardwoodgenomics.org/content/lagerstroemia-indica

Blast DB peptides https://hardwoodgenomics.org/content/lagerstroemia-indica-peptides

RaymondS1 commented 5 years ago

Sample feature https://hardwoodgenomics.org/feature/TRINITY_DN799_c0_g1%3A%3ATRINITY_DN799_c0_g1_i1%3A%3Ag.369%3A%3Am.369?tripal_pane=group_summary_tripalpane

RaymondS1 commented 5 years ago

Download files and link to the publication needed. More info needs to be clickable (link back to the publication). Make transcriptome clickable.

almasaeed2010 commented 5 years ago

Indexing started for ontology browser. The browsers should show up soon.