Closed twang15 closed 2 years ago
long read RNA seq data analyses (at single cell resolution) RNA splicing, intron, exon, RNA editing sam tools, sam file, bam file, cigar Linked-read seq
install samtools and bcftools: http://www.htslib.org/download/
Q1. Why not parse .sam files by ourselves? Could this be easier?
What is the impact of the differences in sam specification over our pipeline?
what is "<" or ">"
what is "k"?
produce a file in the following format: col1: gene ID. Put a dot as a place-holder col2: molecule/well-id. well-11 col3: chromesome ID col3: position in the reference genome col4: editing classification {edited, not-edited, not-covered, unclear} edited: >= 80% all agree it is an edit not-edited: <=20% all, says it is not an-edit unclear: short reads contradict to each other and we cannot make a conclusion. In between (20%, 80%) not-covered: no information for that molecule/well, an empty line
Lane1 ... LaneN, are they the same molecules or different molecules?
draw the histogram (at least 2 reads)
percentage for each group assumption: edited, unedited, uncovered should be the vast majority.
merge the datasets
cd /oak/stanford/scg/lab_mpsnyder/htilgner/encode_SLRs/human/analysis/star_cufflinks/comb_lanes_1-8_neurons_ind1_18.5weeks/v1.3a_moleculo_STAR_cuflinks_outFilterMismatchNmax5/
We discuss how Samtools report RNA editing. Hagen has a guess, but we are really not sure. So Tao asked the question in Samtools community. here is the link: https://github.com/samtools/samtools/issues/1444
samtools convert sam <-> bam : http://seqanswers.com/forums/showthread.php?t=13882
cd /oak/stanford/scg/prj_ENCODE/Long-read-RNA/NP_singe_cell/simple_star/v2_2017_06/r0/1-NPC12 samtools view -h mapping.bam | head -1000 > 1000.sam samtools view -S -b -h 1000.sam > 1000.bam samtools view -h 1000.bam > 2000.sam
diff 1000.sam 2000.sam
29a30,31
> @PG ID:samtools.1 PN:samtools PP:samtools VN:1.12 CL:samtools view -S -b -h 1000.sam
> @PG ID:samtools.2 PN:samtools PP:samtools.1 VN:1.12 CL:samtools view -h 1000.bam
samtools view -S -b -h 2000.sam >2000.bam diff 1000.bam 2000.bam
Binary files 1000.bam and 2000.bam differ
samtools view -h 2000.bam > 3000.sam diff 2000.sam 3000.sam
31a32,33
> @PG ID:samtools.3 PN:samtools PP:samtools.2 VN:1.12 CL:samtools view -S -b -h 2000.sam
> @PG ID:samtools.4 PN:samtools PP:samtools.3 VN:1.12 CL:samtools view -h 2000.bam
diff 1000.sam 3000.sam
29a30,33
> @PG ID:samtools.1 PN:samtools PP:samtools VN:1.12 CL:samtools view -S -b -h 1000.sam
> @PG ID:samtools.2 PN:samtools PP:samtools.1 VN:1.12 CL:samtools view -h 1000.bam
> @PG ID:samtools.3 PN:samtools PP:samtools.2 VN:1.12 CL:samtools view -S -b -h 2000.sam
> @PG ID:samtools.4 PN:samtools PP:samtools.3 VN:1.12 CL:samtools view -h 2000.bam
How to split a bam file based on sample barcode: https://www.biostars.org/p/9462889/ samtools view -H mapping.bam > header
samtools mpileup for SNP calling