Open mestato opened 5 years ago
Current workflow for assembling the transcripts is as follows:
prefetch
to obtain SRA files from NCBI using SRR IDs./staton/projects/undergrads/crepe_myrtle/raw_transcripts
, and run the SRA Toolkit command fastq-dump
to convert SRA files to fastq. This needs to be run using specific options in order for Trinity to work.fastQC
to identify adapter content of the reads, and then run trimmomatic
to remove this content.Rcorrector
on the trimmed reads to correct any errors. This can take a while, depending on how many files are available and their file size.trinity
to assemble the files. If you run fastq-dump
with default settings, trinity
will reject it due to the fastq headers containing information it can't parse.Fastqc
#PBS -N 1_fastqc
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -l nodes=1
#PBS -l walltime=00:10:00
cd $PBS_O_WORKDIR
module load fastqc
for f in /lustre/haven/gamma/staton/projects/undergrads/crepe_myrtle/raw_transcripts/*.fastq
do
filename=$(basename "$f")
base="${filename%%.fastq*}"
echo "filename $filename base $base"
mkdir $base.fastqc
fastqc -o $base.fastqc $f >& $base.fastqc.out
done
wait
Trimming
#PBS -N 2_trimming
#PBS -S /bin/bash
#PBS -j oe
#PBS -A ACF-UTK0011
#PBS -l nodes=1:ppn=2
#PBS -l walltime=00:30:00
cd $PBS_O_WORKDIR
module load java
for F in /lustre/haven/gamma/staton/projects/undergrads/crepe_myrtle/raw_transcripts/*.fastq
do
BASE=$( basename $F | sed 's/.fastq*//g')
echo "F $F"
echo "base $BASE"
java -jar /lustre/haven/gamma/staton/software/Trimmomatic-0.36/trimmomatic-0.36.jar SE -phred33 $F $BASE.trim.fastq ILLUMINACLIP:/lustre/haven/gamma/staton/software/Trimmomatic-0.36/adapters/adapters.fa:2:40:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
done
wait
Organism https://hardwoodgenomics.org/organism/Lagerstroemia/indica
Publication https://hardwoodgenomics.org/Publication/3423189
Swissprot annotation https://hardwoodgenomics.org/BLAST-annotation/3423191
Trembl annotation https://hardwoodgenomics.org/BLAST-annotation/3423192
IntreProScan Annotation https://hardwoodgenomics.org/InterProScan-annotation/3423193
KEGG Annotation https://hardwoodgenomics.org/KEGGresults/3471998?tripal_pane=gp_sbo__relationship11885
CDS upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/790035
Polypeptide upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/791263
Published https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/790040
Gff3 upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/790846
Swissprot upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/792003
Trembl upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/790840
IPS upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/791626
KEGG upload https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/791270
Blast DB CDS https://hardwoodgenomics.org/content/lagerstroemia-indica
Blast DB peptides https://hardwoodgenomics.org/content/lagerstroemia-indica-peptides
Download files and link to the publication needed. More info needs to be clickable (link back to the publication). Make transcriptome clickable.
Indexing started for ontology browser. The browsers should show up soon.
Publication and Data Information
A number of pubs can be found, so lets start with NCBI (4 SRA RNA datasets available, all L. indica).
Additional Information
Checklist
See New Genome Documentation for detailed instructions.