kehrlab / bcctools

Correcting barcodes in 10X linked-read sequencing data.
GNU General Public License v3.0
4 stars 3 forks source link

running on 10x + iso-seq #1

Open dabitz opened 3 years ago

dabitz commented 3 years ago

Hi,

I wonder if bcctools also works correcting barcodes from 10x library followed by iso-seq instead of illumina seq? It would be nice if it works, since I am seeing a lot of unexpected barcodes in my sample. I guess because of some errors in barcodes.

Best, André

bkehr commented 3 years ago

Hi André,

I'd be happy to implement an option for Iso-Seq data in principle.

However, bcctools only corrects substitution errors and no indel errors, which is why I don't recommend using bcctools for barcode correction in long read data in general.

Best regards Birte

dabitz commented 3 years ago

Hi Birte,

Thanks for the reply. I guess with HiFi pacbio Iso-Seq reads indel issues should be minor. Thus, it would be nice to have a way to correct at least the substitution errors. I would be happy to try it, as I have currently experienced and issue with our combined 10X-Iso-Seq library that seems to be split in too much barcodes...

Best, André

bkehr commented 3 years ago

Hi André,

in this case I'll be happy to give it a try. I'd suggest to add an option to bcctools that allows you to specify your UMI/barcode design, with the restriction that your barcodes cannot be longer than 16 bp (I guess they are 16 bp long in your 10x library).

Do you currently process your data with the analysis steps recommended by PacBio? If yes, my suggestion would be to replace the isoseq3 tag command (Step 3) by bcctools.

Best, Birte

dabitz commented 3 years ago

Hi Birte,

Thanks for implementing this option. I think it will be very nice to have this correction step on the pipeline. And yes, my barcode is 16bp + 12 UMI (10X v3). I am currently, using this pipeline from cupcakehttps://github.com/Magdoll/cDNA_Cupcake/wiki/Iso-Seq-Single-Cell-Analysis:-Recommended-Analysis-Guidelines

but I stop at step 5, since later I need to split the fasta by barcodes and then map the best cell individually. Therefore, it would be nice to have an option to add the barcode as prefix for the read names. Thus, I can easily demultiplex by name.

Cheers André

bkehr commented 3 years ago

Excellent, thanks. I should be able to work with this information and will get back to you once I have something working.

dabitz commented 3 years ago

Great! Looking forward!