Closed ssnn-airr closed 6 years ago
Original comment by Roy Jiang (Bitbucket: ruoyijiangyale, ).
I think we'd want to do that in 2 stages.... no?
Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).
Should also accommodate this case:
>sequence
NNNNNNNNNNNNNNNNXXXXXXXXXXATGTCGATAGCTACGTCACTG
Where N = cell barcode and X = UMI. And what you want is:
>sequence|CELL=NNNNNNNNNNNNNNNN|UMI=XXXXXXXXXX
NNNNNNNNNNNNNNNNXXXXXXXXXXATGTCGATAGCTACGTCACTG
Original comment by Roy Jiang (Bitbucket: ruoyijiangyale, ).
This can be done by creating a primer file like this:
>BARCODE
NNNNNNNNNNN
MaskPrimers.py score \ -s ${SEQ} -p ${PRIMERS} \ --mode cut/trim/tag/mask \ --start 2 \ --barcode \ --maxerror 0.2 #irrelevant...
But another mode where only the primer is removed vs cut and trim (which either remove the preceding nts or both the preceding and the primer). And changing the barcode specification so that the cut out chunk is placed in the annotation field.
Original report by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).
We could use a subcommand in MaskPrimers to deal with data that do not have primer sequences, such as masking X bases from a given start position. The same mode should probably be able to extract UMIs both as part of the masking process and without any masking.