lucapinello / CRISPResso

Software pipeline for the analysis of CRISPR-Cas9 genome editing outcomes from sequencing data
Other
131 stars 55 forks source link

Trimming adapter sequences that are not Nextera #20

Closed nanez20 closed 7 years ago

nanez20 commented 7 years ago

Hello, I have a bunch a fastq.gz files to analyze, paired end reads. However the adapter sequences are not Nextera and I don't know which ones they are exactly.

Just from looking at the fastq.gz files, can I know which adapter sequences were used, and if so, how can I then trim them with --trimmomatic_options_string ??

I guess I have to create a .fa file similar to the "NexteraPe-PE.fa" file, however I'm not sure how to correctly do this.

I attach the two files (pair ends) of one sequencing. Would you please help me out and guide me as to be able to this by myself in the future? Thank you so much, I would really appreciate it! You tool is very useful and a great contribution to the scientific community. Best, Alex Won_Tae_1_S48_L001_R1_001.fastq.gz Won_Tae_1_S48_L001_R2_001.fastq.gz

lucapinello commented 7 years ago

Hi Alex,

To detect the adapter used I suggest you to either blast those sequences or use FASTQC from here: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

In particular you need to use those two modules: 1) http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/10%20Adapter%20Content.html

2)http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/9%20Overrepresented%20Sequences.html

After you recover the adapter sequence you can read more about the format of the required file here: http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf

About your last question, assuming that you have the file CUSTOM_ADAPTER.fa in the correct format you need to add the option like this:

--trim_sequences --trimmomatic_options_string " ILLUMINACLIP:/Users/data/CUSTOM_ADAPTER.fa:0:90:10:0:true MINLEN:40 "

You need to adjust the parameters according to your particular case, please read carefully the trimmomatic manual.

You can also trim the sequences outside CRISPResso with trimmomatic so you don't need to re-run the entire procedure every time.

Regards,

Luca