jsh58 / NGmerge

Merging paired-end reads and removing adapters
MIT License
44 stars 15 forks source link

adapters remains after using NGmerge #6

Closed tothuhien closed 5 years ago

tothuhien commented 5 years ago

Hello, I’ve just tried to use NGmerge to cut the adapter from about paired-end data. Fastqc Report shows that Nextera Transposase Sequence is the adapter (Fig1). I use NGmerge to cut the adapter with the following command: NGmerge/NGmerge -z -a -1 R1.fastq.gz -2 R2.fastq.gz -o cut_R But the cut file still contains some adapters (Fig2) Do you have any idea about that? Did I use it properly? Thank you very much Hien Fig1 Fig2

jsh58 commented 5 years ago

Hien,

Thanks for the question, and for including the graphs. It appears that your library had some very short fragments, with adapters starting to appear around base 30. Therefore, you should adjust the -e parameter:

-e <int> Minimum overlap of dovetailed alignments (def. 50) This is the minimum overlap length (in bp) for alignments with 3' overhangs (see Fig. 2B). This value should be set to the length of the absolute shortest DNA fragment that may have been sequenced.

You should decrease this value to, e.g., -e 30. The default value is 50, which is why in your second graph the curve flattens out after 50bp.

John Gaspar

tothuhien commented 5 years ago

Thanks for your quick response, I understand the problem now. I've tried your suggestion and it works now! Could I ask another question? Is it ok if I set this value too low, for example 10, because I'm not sure about the length of shortest fragments. Could it make any effect on the results? Thanks in advance.

jsh58 commented 5 years ago

I guess I shouldn't have cut off the parameter description:

-e <int> Minimum overlap of dovetailed alignments (def. 50) This is the minimum overlap length (in bp) for alignments with 3' overhangs (see Fig. 2B). This value should be set to the length of the absolute shortest DNA fragment that may have been sequenced. Using a value that is too low may result in false positives, especially if the reads contain repetitive sequences.

tothuhien commented 5 years ago

Thanks a lot! I should have had read the user guide carefully. Cheers.