iraiosub / riboseq-flow

A Nextflow DSL2 pipeline to perform ribo-seq data analysis and comprehensive quality control.
MIT License
8 stars 1 forks source link

Add functions to extract UMI at the 3' end of reads #86

Closed huipan1973 closed 2 months ago

huipan1973 commented 2 months ago

Is there an option to extract UMI at the 3' end of reads? Could you add that function?

huipan1973 commented 2 months ago

It looks like the pipeline performs UMI extraction before adaptor trimming. Our 3' UMI is upstream of the 3' adapter. Is there a way to trim the adapter and then extract the UMI?

iraiosub commented 2 months ago

If your read design is insert-UMI-adaptor, where the UMI is located before the adaptor sequence, you can still use riboseq-flow (which uses UMI-tools) to extract the UMI and move it to the header without removing the adaptor, and subsequently trim the adaptor.

You can achieve this by providing the right options for UMI-tools in riboseq-flow: i.e. --umi_extract_method (equivalent to --extract-method in UMI-tools) as 'regex' and --umi_pattern(equivalent to --bc-pattern in UMI-tools) as the regular expression fit for your read structure and aim. You need to go to the UMI-tools documentation to figure out the settings and expression appropriate for your case.

As this functionality is already supported by riboseq-flow, I will now close this issue.

huipan1973 commented 2 months ago

Hi,

Yes, I am aware of the regex option. The problem is that cutadapt was run after umi_tools. Therefore, umi_tools will extract part of the adapter instead of the UMI which is upstream of the adapter. We have used other pipeline that runs adapter trimming before umi_tools extract, we are thinking if you could make such modification to Riboseq-flow.

Best regards, Hui

From: iraiosub @.> Sent: Friday, June 21, 2024 7:09 AM To: iraiosub/riboseq-flow @.> Cc: Hui Pan @.>; Author @.> Subject: Re: [iraiosub/riboseq-flow] Add functions to extract UMI at the 3' end of reads (Issue #86)

If your read design is insert-UMI-adaptor, where the UMI is located before the adaptor sequence, you can still use riboseq-flow (which uses UMI-tools) to extract the UMI and move it to the header without removing the adaptor, and only then trim the adaptor.

You can achieve this by providing the right options for UMI-tools in riboseq-flow: i.e. --umi_extract_method (equivalent to --extract-method in UMI-tools) as 'regex' and --umi_pattern (equivalent to --bc-pattern in UMI-tools) as the regular expression fit for your read structure and aim. You need to go to the UMI-tools documentation to figure out the settings and expression appropriate for your case.

As this functionality is already supported by riboseq-flow, I will now close this issue.

— Reply to this email directly, view it on GitHubhttps://github.com/iraiosub/riboseq-flow/issues/86#issuecomment-2182541110, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABBEJDRRC6Z5QNKQIX6DRRTZIQCVXAVCNFSM6AAAAABJUHJ27KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBSGU2DCMJRGA. You are receiving this because you authored the thread.Message ID: @.**@.>>

iraiosub commented 2 months ago

Hi Hui,

if you provide the correct regex to UMI-tools to make sure the adaptor is retained in the reads during the process, you can first extract the UMIs and and then do adaptor trimming, without the need to swap the order of these steps in riboseq-flow. We implement UMI extraction before adaptor trimming for technical reasons. Since the pipeline works with your library design as it is, we will not modify the workflow to accommodate this use case.

huipan1973 commented 2 months ago

I will need to add the adapter sequence to the regex pattern with some mismatch allowed, it that right?

Best regards, Hui

From: iraiosub @.> Sent: Friday, June 21, 2024 9:17 AM To: iraiosub/riboseq-flow @.> Cc: Hui Pan @.>; Author @.> Subject: Re: [iraiosub/riboseq-flow] Add functions to extract UMI at the 3' end of reads (Issue #86)

Hi Hui,

if you provide the correct regex to UMI-tools to make sure the adaptor is retained in the reads during the process, you can first extract the UMIs and and then do adaptor trimming, without the need to swap the order of these steps in riboseq-flow. We implement UMI extraction before adaptor trimming for technical reasons. Since the pipeline works with your library design as it is, we will not modify the workflow to accommodate this use case.

— Reply to this email directly, view it on GitHubhttps://github.com/iraiosub/riboseq-flow/issues/86#issuecomment-2182735959, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABBEJDTFI7ZORZ25FC5PXKDZIQRTZAVCNFSM6AAAAABJUHJ27KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBSG4ZTKOJVHE. You are receiving this because you authored the thread.Message ID: @.**@.>>