twang15 / Long-read-RNA

0 stars 0 forks source link

Meeting Memos #1

Closed twang15 closed 2 years ago

twang15 commented 3 years ago

samtools mpileup for SNP calling

  1. record for one individual RNA edit or SNP, all the reads SNP calling, individual read, reference support, alternative Allele support
twang15 commented 3 years ago

long read RNA seq data analyses (at single cell resolution) RNA splicing, intron, exon, RNA editing sam tools, sam file, bam file, cigar Linked-read seq

twang15 commented 3 years ago

install samtools and bcftools: http://www.htslib.org/download/

twang15 commented 3 years ago

Thursday, 3/11/2021, 4:00 PM PST

  1. Tao: fix the source code of samtools and bcftools; new key words: RNA edit, Unique Molecule Identifier, bulk -> single-cell -> single molecule ? ( able to know which reads belong to the same molecule), so they should always agree with each other, linked read, short read, cDNA
  2. Hagen: give paper draft to Tao; talk to M for authorship arrangement
  3. Fereshteh: sponsor Hagen's SCG access
twang15 commented 3 years ago

Thursday, 3/18/2021, 2:00 PM PST

  1. bcftools mpileup is the right tool to inspect
twang15 commented 3 years ago

Thursday, 3/25/2021, 2:00 PM PST

Q1. Why not parse .sam files by ourselves? Could this be easier?

twang15 commented 3 years ago

What is the impact of the differences in sam specification over our pipeline?

twang15 commented 3 years ago

Thursday, 04/01/2021, 2:00 PM PST

  1. Progress that Tao have made. We are very close to a solution.
  2. Authorship. Morten agrees the “Fereshteh, Tao, Morten” ordering of the co-first authors.
  3. Hagen will sent Tao the follow-up materials.
twang15 commented 3 years ago

Thursday, 04/08/2021, 2:00 PM PST

  1. Help Hagen to login SCG
  2. data directory /oak/stanford/scg/lab_mpsnyder/htilgner/encode_SLRs/human/analysis/star_cufflinks/comb_lanes_1-8_neurons_ind1_18.5weeks/v1.3a_moleculo_STAR_cuflinks_outFilterMismatchNmax5/L001
twang15 commented 3 years ago

Friday, 04/09/2021, 4:00 PM, PST

  1. shared google doc with Fereshteh and Hagen, https://docs.google.com/document/d/1vzS4GfuuO5yJUM5drUVYmYtn1Zn8rPRv-73UIPYSxKg/edit
  2. Schedule another meeting with Hagen.
twang15 commented 3 years ago

Thursday, 04/15/2021

  1. what is "<" or ">"

  2. what is "k"?

  3. produce a file in the following format: col1: gene ID. Put a dot as a place-holder col2: molecule/well-id. well-11 col3: chromesome ID col3: position in the reference genome col4: editing classification {edited, not-edited, not-covered, unclear} edited: >= 80% all agree it is an edit not-edited: <=20% all, says it is not an-edit unclear: short reads contradict to each other and we cannot make a conclusion. In between (20%, 80%) not-covered: no information for that molecule/well, an empty line

  4. Lane1 ... LaneN, are they the same molecules or different molecules?

  5. draw the histogram (at least 2 reads)

twang15 commented 3 years ago

Thursday, 04/22/2021, 2:00 PM PST

  1. Understand ^k and $
    • < and > : the read does not have information at this position and should be discarded
    • ^k: the base read to the right is the beginning of a read
    • $: the base read to the right is the end of a read
  2. Discuss strand specificity for RNA-seq
    • the RNA molecules is reverse-transcribed into cDNA, single strand or double stranded
    • the PCRed DNA molecule (does not know which strand it is if it is from double-stranded cDNA) is mapped to reference genome, either matching forward or reverse strand
  3. Re-define the format for the first program
    • passes in a big software vs. a series of small tools stitched together by bash or python
    • compiler-style or bash-style
twang15 commented 3 years ago

Friday, 04/30/2021, 2:00 PM PST

  1. if G/g (edited)
  2. or ., (unedited)
    • or <> does not occupy more than 80%, we call it unclear
  3. if < / > 80%, call it uncovered (3rd group, uncovered)
  4. if there are more than 80% +/-, called plusMinus group (4th group) . -1A */#
  5. unclear (5th group)

percentage for each group assumption: edited, unedited, uncovered should be the vast majority.

merge the datasets

cd /oak/stanford/scg/lab_mpsnyder/htilgner/encode_SLRs/human/analysis/star_cufflinks/comb_lanes_1-8_neurons_ind1_18.5weeks/v1.3a_moleculo_STAR_cuflinks_outFilterMismatchNmax5/

twang15 commented 3 years ago

Thursday, 05/06/2021, 2:00 PM PST, Hagen and Fereshteh

  1. Check out the classification result for all 7 lanes together
  2. Schedule a meeting on 05/07/2021, 12:PM

TODO

  1. Refine the code
  2. Testing
twang15 commented 3 years ago

05/28/2021, Hagen and Tao, Fereshteh

  1. Debugging
  2. UCSC Genome Browser: https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr7%3A21901520%2D21901526&hgsid=1116678323_K0w9CUSAK1gQebO16VAQak8QePaD
twang15 commented 3 years ago

06/10/2021, Hagen, Tao, Fereshteh

We discuss how Samtools report RNA editing. Hagen has a guess, but we are really not sure. So Tao asked the question in Samtools community. here is the link: https://github.com/samtools/samtools/issues/1444

twang15 commented 3 years ago

06/24/2021, Hagen, Tao, Fereshteh

samtools convert sam <-> bam : http://seqanswers.com/forums/showthread.php?t=13882

cd /oak/stanford/scg/prj_ENCODE/Long-read-RNA/NP_singe_cell/simple_star/v2_2017_06/r0/1-NPC12 samtools view -h mapping.bam | head -1000 > 1000.sam samtools view -S -b -h 1000.sam > 1000.bam samtools view -h 1000.bam > 2000.sam

diff 1000.sam 2000.sam

      29a30,31
      > @PG   ID:samtools.1   PN:samtools     PP:samtools     VN:1.12 CL:samtools view -S -b -h 1000.sam
      > @PG   ID:samtools.2   PN:samtools     PP:samtools.1   VN:1.12 CL:samtools view -h 1000.bam

samtools view -S -b -h 2000.sam >2000.bam diff 1000.bam 2000.bam

    Binary files 1000.bam and 2000.bam differ

samtools view -h 2000.bam > 3000.sam diff 2000.sam 3000.sam

    31a32,33
    > @PG   ID:samtools.3   PN:samtools     PP:samtools.2   VN:1.12 CL:samtools view -S -b -h 2000.sam
    > @PG   ID:samtools.4   PN:samtools     PP:samtools.3   VN:1.12 CL:samtools view -h 2000.bam

diff 1000.sam 3000.sam

    29a30,33
    > @PG   ID:samtools.1   PN:samtools     PP:samtools     VN:1.12 CL:samtools view -S -b -h 1000.sam
    > @PG   ID:samtools.2   PN:samtools     PP:samtools.1   VN:1.12 CL:samtools view -h 1000.bam
    > @PG   ID:samtools.3   PN:samtools     PP:samtools.2   VN:1.12 CL:samtools view -S -b -h 2000.sam
    > @PG   ID:samtools.4   PN:samtools     PP:samtools.3   VN:1.12 CL:samtools view -h 2000.bam
twang15 commented 2 years ago

07/16/2021

How to split a bam file based on sample barcode: https://www.biostars.org/p/9462889/ samtools view -H mapping.bam > header