Closed yaaminiv closed 7 years ago
Great find.. but you would not do BLAST on sequence reads, but rather assembled contigs.
Here is a notebook where a trim program was used. the adaptors would be included in the cutadapt file
https://github.com/sr320/course-btea/blob/master/day-1/01-Geoduck-transcriptome.ipynb
!/Applications/bioinfo/trim_galore_zip/trim_galore \
--paired \
--retain_unpaired \
--path_to_cutadapt /Users/sr320/.local/bin/cutadapt \
/Users/sr320/data-genomic/tentacle/Geo_Pool_M_CTTGTA_L006_R2_001.fastq \
/Users/sr320/data-genomic/tentacle/Geo_Pool_M_CTTGTA_L006_R1_001.fastq \
/Users/sr320/data-genomic/tentacle/Geo_Pool_F_GGCTAC_L006_R2_001.fastq \
/Users/sr320/data-genomic/tentacle/Geo_Pool_F_GGCTAC_L006_R1_001.fastq
I'm looking at the notebook, and I'm not entirely sure what the --path_to_cutadapt
argument does. It's not included in the help file for trim_galore
; why is this line included?
Currently working in this notebook.
I downloaded cutadapt
and tried to use it, but ended up with an error.
Below are several screenshots of my cutadapt
directory. Is there a different path I need to use? I saw that there was an align file in a different folder within that directory, but setting my path to that folder also gives me an error.
Short take is always look at the manual for the version of software you are running. Different versions often have different arguments (in fact that is what happened here.)
Also it would be best to go ahead tackle the 2 steps outlined here https://github.com/yaaminiv/yaaminiv-fish546-2016/issues/2
not getting hung up on trimming- can always come back to that later.
I looked at the trim galore help page and I think I'm using the right arguments! If the problem is with cutadapt, I'm not entirely sure what it is. I'll table this for now and go ahead with BLAST/Kallisto.
You are correct - I was wrong in my interpretation of the problem.
Simplest thing would be to just put it in your PATH. But get the other items started and we can revisit.
I have several TruSeq Adapter sequences present in my data based off of FastQC sequence duplication analysis results (see notebook for specifics). Are these adapter sequences something I should trim before doing a BLAST? If so, how should I go about this?