Submit Dungeness crab genome to GenSaS for annotation

laurahspencer commented 1 year ago

Status report:

Created project on GenSas called "Dungeness crab genome annotation" under my account.
Uploaded the Dungeness crab genome to the project
Added evidence files:
- Our RNASeq data
- Trimmed our RNASeq data: I first performed a typical adapter and quality trimming/filtering using the script rnaseq-trim-fastqc.sh. After inspecting the reads via fastqc/multiqc, the Per Base Sequence Content of the first 11bp in all samples looked weird, so I then hard-trimmed the first 11bp from all reads using the script rnaseq-trim-extra-11bp.sh, here's the final multiqc report for fully trimmed data: multiqc_report_11bp-hardtrimmed.html
- Uploaded trimmed data to project
- Additional RNASeq data from NCBI - the more data the better for annotation
- Found Dungeness crab RNASeq data on NCBI (adult male y-organ) submitted by the Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research.
- Currently using SRA-Toolkit to download data from the 2 runs (to my external hard drive), which I will upload to GenSaS.

@kubu4 uploading to GenSas is slow, since I have to transfer data files from Sedna to my work computer, then up to GenSas. I haven't found any options to upload to GenSas directly from Sedna or NCBI - am I missing anything here, or did you also upload to GenSaS via point-and-click methods?

kubu4 commented 1 year ago

did you also upload to GenSaS via point-and-click methods?

Yep. We weren't using any data in any public databases, just our own RNAseq data we had locally so didn't think much of it. Plus, I was/am able to "mount" our servers as drives on the lab computers, so no need for in-between step of moving from server to computer.

kubu4 commented 1 year ago

Oh, actually I just looked at the GenSAS interface.

I uploaded via URL, since all of our data is hosted publicly on our servers.

You might be able to use the FTP file location URLs for NCBI data?

laurahspencer commented 1 year ago

In progress! Various tasks have been running for a few weeks, probably another few weeks left.

I also heard about another tool for genome annotation, [GAWN]https://github.com/enormandeau/gawn) - "Genome Annotation Without Nightmares". It requires a transcriptome and the genome as input. Could be another good option for "good-enough" annotations.

laurahspencer commented 1 year ago

I'm pursuing the GAWN tool for genome annotation. Giles is generating a transcriptome to input into GAWN, but needs to know whether the RNASeq libraries were stranded. @kristamnichols what kit was used to prepare RNASeq libraries? I'll eventually need all library prep info, so feel free to connect me with whoever did the prep so I can ask for all the details.

kristamnichols commented 1 year ago

This is Genewiz project 30-544518807. I don't see library prep details in my account for this project and will send an email and copy you on it. I cannot remember what prep method they used!

kubu4 commented 1 year ago

@ggoetznoaa - You're probably on top of this, but don't forget to trim the reads prior to transcriptome assembly.

ggoetznoaa commented 1 year ago

@kubu4 I've gone back and forth on this with @laurahspencer she already has trimmed reads but I can re-trim the raw reads. I can use the default settings I used on the last RNA-Seq project. This is what I used on the last project (Mac's).

cutadapt \
    -j 0 \
    -o ${OUT}/${sample}.trimmed.R1.fastq.gz \
    -p ${OUT}/${sample}.trimmed.R2.fastq.gz \
    -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
    -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
    -q 15,10 \
    -m 40 \
    --trim-n \
    ${IN}/${sample}_R1_*.fastq.gz \
    ${IN}/${sample}_R2_*.fastq.gz \
    &> ${OUT}/cutadapt.${sample}.log

kubu4 commented 1 year ago

she already has trimmed reads

Cool. Figured you were already on top of things!

laurahspencer commented 1 year ago

Results

https://github.com/laurahspencer/DuMOAR/tree/main/results/GenSAS

laurahspencer / DuMOAR

Submit Dungeness crab genome to GenSaS for annotation #3