zrcjessica / demux

A Snakemake pipeline for running single-cell demultiplexing simulations.
2 stars 0 forks source link

sample-specific information in each sample's BAM file #2

Closed aryarm closed 4 years ago

aryarm commented 4 years ago

new_bam.py should remove sample-specific information from each sample's BAM file and probably replace it with something universal across all of the samples

What do we need to change?

  1. The RG tags (both in the header and in each read)
  2. The PG tags (both in the header and in each read)
  3. The CO tags (from the header only)
  4. The read IDs?
  5. The header, in its entirety Like, we need to somehow prepare the headers for merging
aryarm commented 4 years ago

My strategy

In new_bam.py

zrcjessica commented 4 years ago

After making all of these changes, can you still make a reference table mapping the old sample barcodes to these new ones? I think this would make it easier to evaluate the results of demuxlet.

aryarm commented 4 years ago

@zrcjessica isn't that information in the barcodes.tsv files that get generated from the script you're writing? none of that information gets changed by new_bam.py, the script I'm writing

zrcjessica commented 4 years ago

Okay, you're right, that sounds good!