How to extract the UMI info in illumina read's name into a seperate tag

Hello @TendoLiu - the name of your read looks a bit weird to me, as it contains a Casava barcode (1:N:0:TCCGGAGA) and the UMI appended to the read name (TATGTNC+NNGAGCA). Is this a FASTQ or a BAM file?

ReadTools is a bit "picky" with read names, as it only understands 2 formats that are common:

Casava: e.g. @NS500211:808:HW27KAFXY:1:11101:12228:1057 1:N:0:TCCGGAGA, where the identified barcode will be TCCGGAGA
Illumina: e.g., @NS500211:808:HW27KAFXY:1:11101:12228:1057#TATGTNC+NNGAGCA, where the identified barcode will be TATGTNC+NNGAGCA. Note that, contrary to your case, the barcodes are separated from the read name by # instead of :., and that only one barcode is detected as + is used for concatenation instead of the standard (in the specs), which is -.

ReadTools can handle only one of the problems that you are facing: the barcode separator could be overriden (although will still be used for all the output files) with the java property barcode_index_delimiter (so providing -Dbarcode_index_delimiter=+ in your case). Nevertheless, I am not sure if your use-case matches AssignReadGroupByBarcode, as it is designed for barcodes (like the one after the space) and not for UMIs (I am not familiar with them, but maybe appending them to the read name with : as separator is a standard there...)

Could you please clarify with this information? Thanks!

magicDGS / ReadTools

How to extract the UMI info in illumina read's name into a seperate tag #533