Closed RainerWaldmann closed 6 months ago
Hi @RainerWaldmann
Will adding 8 additional Ns to the mask and adding 8 Ns to each barcode sequence work?
This would be my suggestion too. But unfortunately we don't support the barcode matching part with N
s in dorado right now. However that support can be added, and a valuable feature to support for UMI based workloads. If I provide you with a build would you be able to test it out?
You may also have to adjust some score thresholds, since the barcode score is calculated as 1.0 - (edit_dist / barcode_length)
. With N
s in the barcode the barcode length is increased but the edit distance may still be low, causing the scores to appear mode inflated.
Hi @tijyojwad
Adding Ns to the barcode would be rather a workaround associated with the issues you mentioned. Another option that might be better, if the current code supports it, is to provide a mask with just one flanking sequence and the barcodes as usual. e.g. the mask for the barcode that is flanked by the UMI llRW_1st AACAAGCAGAAGACGGCATACGAGATNNNNNNNNNN this would avoid the scoring issues. Depends on whether the current software supports this. Identification of the start of position of the barcode should be precise enough with the current Nanopore sequencing accuracy.
A more ideal solution, if you plan to have a more universal and extendable option, would be a mask where Ns define the barcode sequence and another character e.g. Z defines the UMI. e.g. llRW_1st AACAAGCAGAAGACGGCATACGAGATNNNNNNNNNNZZZZZZZGTCTCGTGGGCTCGGAGATGTG
I guess UMIs will be increasingly used in Nanopore sequencing and such a mask would also allow extraction of the UMI sequence in the long run. Could even do the first part of the single cell workflow (cell barcode and UMI extraction).
Initially I was thinking about adapting the Java code I wrote for single cell barcode and UMI extraction (ucagenomix/sicelore-2.1). But I guess this should be possible with Dorado and I would be interested to give it a try.
Hi @RainerWaldmann -
Another option that might be better, if the current code supports it, is to provide a mask with just one flanking sequence and the barcodes as usual.
Yes this is already supported. In the custom barcode arrangement you can just leave the mask1_rear
empty and dorado will only use the sequence from the front flank.
We'll take into consideration the additional mask for UMIs! It's a good suggestion
HI @RainerWaldmann - I'm closing this ticket since I haven't heard back from you on using the mask1_rear
option for your custom setup. I'd be curious to know if it worked out if you're willing to share results!
Hi Joyjit
I'll try it within the next two weeks. We had some delays generating the libraries.
Hi @tijyojwad
I tried with the mask1rear empty (the side where the UMI is located). Works better than I expected. Despite the very short ( 8 nt.) NEB barcodes (96 plex double indexing). The wrong barcode with the most reads gets 10,000 x less reads than the correct barcode.
Awesome glad to hear it!
I need to demultiplex barcodes that are flanked by UMIs (NEB Unique Dual Index UMI Adaptors). Left flanking sequence - BARCODE - UMI(N8) - right flanking sequence Before we generate the libraries with UMIs it would be great if we knew in advance whether we'll get this demultiplexed with Dorado. For double indexing with Illumina 10 nt barcodes we currently use the following masks
Thanks, Rainer