Crepe myrtle (Lagerstroemia spp.) transcriptomes

mestato commented 5 years ago

Publication and Data Information

A number of pubs can be found, so lets start with NCBI (4 SRA RNA datasets available, all L. indica).

Additional Information

Checklist

See New Genome Documentation for detailed instructions.

[ ] Create Organism
[ ] Create Publication
[ ] Create Reference Genome
[ ] Create InterProScan Annotation
[ ] Create BLAST Annotation
[ ] Run Chado FASTA Loader for CDS
[ ] Run Chado FASTA Loader for Polypeptides
[ ] Publish Tripal content
[ ] Create KEGG Annotation
[ ] Run Chado KEGG Loader
[ ] Run Chado BLAST XML results loader
[ ] Run Chado InterProScan XML results loader
[ ] Create Blast Database

MattHuff commented 5 years ago

Current workflow for assembling the transcripts is as follows:

Use the SRA Toolkit command prefetch to obtain SRA files from NCBI using SRR IDs.
Move files from localhome NCBI directory to /staton/projects/undergrads/crepe_myrtle/raw_transcripts, and run the SRA Toolkit command fastq-dump to convert SRA files to fastq. This needs to be run using specific options in order for Trinity to work.
Run fastQC to identify adapter content of the reads, and then run trimmomatic to remove this content.
Run Rcorrector on the trimmed reads to correct any errors. This can take a while, depending on how many files are available and their file size.
Run trinity to assemble the files. If you run fastq-dump with default settings, trinity will reject it due to the fastq headers containing information it can't parse.

RaymondS1 commented 5 years ago

Fastqc

#PBS -N 1_fastqc
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -l nodes=1
#PBS -l walltime=00:10:00

cd $PBS_O_WORKDIR

module load fastqc

for f in /lustre/haven/gamma/staton/projects/undergrads/crepe_myrtle/raw_transcripts/*.fastq
do
        filename=$(basename "$f")
        base="${filename%%.fastq*}"
        echo "filename $filename base $base"
        mkdir $base.fastqc

        fastqc -o $base.fastqc $f >& $base.fastqc.out
done

wait

RaymondS1 commented 5 years ago

Trimming

#PBS -N 2_trimming
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -l nodes=1:ppn=2
#PBS -l walltime=00:30:00

cd $PBS_O_WORKDIR

module load java

for F in /lustre/haven/gamma/staton/projects/undergrads/crepe_myrtle/raw_transcripts/*.fastq
do
        BASE=$( basename $F | sed 's/.fastq*//g')
        echo "F $F"
        echo "base $BASE"

        java -jar /lustre/haven/gamma/staton/software/Trimmomatic-0.36/trimmomatic-0.36.jar SE -phred33 $F $BASE.trim.fastq ILLUMINACLIP:/lustre/haven/gamma/staton/software/Trimmomatic-0.36/adapters/adapters.fa:2:40:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

done 

wait

RaymondS1 commented 5 years ago

Organism https://hardwoodgenomics.org/organism/Lagerstroemia/indica

Publication https://hardwoodgenomics.org/Publication/3423189

Swissprot annotation https://hardwoodgenomics.org/BLAST-annotation/3423191

Trembl annotation https://hardwoodgenomics.org/BLAST-annotation/3423192

IntreProScan Annotation https://hardwoodgenomics.org/InterProScan-annotation/3423193

KEGG Annotation https://hardwoodgenomics.org/KEGGresults/3471998?tripal_pane=gp_sbo__relationship11885

RaymondS1 commented 5 years ago

CDS upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/790035

Polypeptide upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/791263

Published https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/790040

Gff3 upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/790846