trichelab / spiky

2 stars 5 forks source link

add_frag_info does not identify unmethylated fragments for 80b controls #27

Open prisnir-zz opened 3 years ago

prisnir-zz commented 3 years ago

For the 80b spike-in control, I was anticipating 9 methylated fragments (6 are mod-meth) and 9 unmethylated fragments. The results from add_frag_info() shows 0 for methylated flag column.

spike.count <- scan_spike_counts(sbs, spike=spike)
annotated.spike.counts <- add_frag_info(spike.count, spike=spike)
unique(annotated.spike.counts[annotated.spike.counts$fraglen == 80,c("frag_grp", "methylated")])
             frag_grp methylated
80b_1C_35G-2  80_1_35          0
80b_1C_50G-2  80_1_50          0
80b_1C_65G-2  80_1_65          0
80b_2C_35G-2  80_2_35          0
80b_2C_50G-2  80_2_50          0
80b_2C_65G-2  80_2_65          0
80b_4C_35G-2  80_4_35          0
80b_4C_50G-2  80_4_50          0
80b_4C_65G-2  80_4_65          0

I have checked via samtools view for the presence of the methylated fragments.

Please enable add_frag_info() to identify methylated 80b fragments too.

prisnir-zz commented 3 years ago

Observing similar trends with the other controls too:

> unique(annotated.spike.counts[annotated.spike.counts$fraglen == 160,c("frag_grp", "methylated")])
              frag_grp methylated
160b_2C_35G-2 160_2_35          0
160b_2C_50G-2 160_2_50          0
160b_2C_65G-2 160_2_65          0
160b_4C_35G-2 160_4_35          0
160b_4C_50G-1 160_4_50          1
160b_4C_65G-1 160_4_65          1
160b_8C_50G-2 160_8_50          0
160b_8C_65G-2 160_8_65          0
> unique(annotated.spike.counts[annotated.spike.counts$fraglen == 320,c("frag_grp", "methylated")])
                frag_grp methylated
320b_16C_35G-2 320_16_35          0
320b_16C_50G-2 320_16_50          0
320b_16C_65G-1 320_16_65          1
320b_4C_35G-2   320_4_35          0
320b_4C_50G-2   320_4_50          0
320b_4C_65G-1   320_4_65          1
320b_8C_35G-1   320_8_35          1
320b_8C_50G-2   320_8_50          0
320b_8C_65G-1   320_8_65          1

I am not sure if I am interpreting this flag correctly. My understanding is that this flag is indicative of the meth or unmeth control scanned from the bam file... and the number of meth and unmeth controls for 160b frags and 320b frags are 8 and 9 each respectively. Can you perhaps elaborate about the methylated flag?

ttriche commented 3 years ago

methylated is for the spike contig annotations themselves, which for scan_spike_counts (newer faster version), are propagated into the resutlts:

R> with(spike_counts(sb, spike=spike), table(width, methylated))
     methylated
width 0 1
  80  9 9
  160 8 8
  320 9 9

It's possible that using the index (scan_spike_counts) vs. counting fragments (scan_spiked_bam) is introducing a nonlinear transformation that we should revisit (regarding fragments vs. raw reads). The former is much faster due to only scanning the BAM/CRAM index, but it is also going to be much more sensitive to assumptions -- it can be finessed by using (only) the first read in properly paired uniquely mapped fragments on the spikes, or alternatively we can suggest that users prefer scan_spiked_bam (which needs some optimization to cache the coverage bins and run a smooth over the various sequence features anyways).

Probably worth setting aside 15-30 minutes to discuss on a call in terms of where we should focus efforts going forwards. The package is already general enough to handle arbitrary spikes at arbitrary concentrations, using arbitrary models fit to arbitrary protocols. We might as well settle on a good way for users to specify how to count fragments and whether speed or precision is more important (which is less obvious than it might seem).

Timothy J. Triche, Jr., PhD Assistant Professor of Bioinformatics, Van Andel Institute 330 Bostwick Avenue NE, Grand Rapids, MI, 49503 (616) 234-5316 (office) (626) 375-9663 (mobile) (616) 234-5252 (administrative, courtesy of Lauren Dunkelberg) https://trichelab.vai.org/


From: Prisni R @.***> Sent: Monday, April 5, 2021 6:14 PM To: trichelab/spiky Cc: Subscribed Subject: [External] Re: [trichelab/spiky] add_frag_info does not identify unmethylated fragments for 80b controls (#27)

Observing similar trends with the other controls too:

unique(annotated.spike.counts[annotated.spike.counts$fraglen == 160,c("frag_grp", "methylated")]) frag_grp methylated 160b_2C_35G-2 160_2_35 0 160b_2C_50G-2 160_2_50 0 160b_2C_65G-2 160_2_65 0 160b_4C_35G-2 160_4_35 0 160b_4C_50G-1 160_4_50 1 160b_4C_65G-1 160_4_65 1 160b_8C_50G-2 160_8_50 0 160b_8C_65G-2 160_8_65 0 unique(annotated.spike.counts[annotated.spike.counts$fraglen == 320,c("frag_grp", "methylated")]) frag_grp methylated 320b_16C_35G-2 320_16_35 0 320b_16C_50G-2 320_16_50 0 320b_16C_65G-1 320_16_65 1 320b_4C_35G-2 320_4_35 0 320b_4C_50G-2 320_4_50 0 320b_4C_65G-1 320_4_65 1 320b_8C_35G-1 320_8_35 1 320b_8C_50G-2 320_8_50 0 320b_8C_65G-1 320_8_65 1

I am not sure if I am interpreting this flag correctly. My understanding is that this flag is indicative of the meth or unmeth control scanned from the bam file... and the number of meth and unmeth controls for 160b frags and 320b frags are 8 and 9 each respectively. Can you perhaps elaborate about the methylated flag?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/trichelab/spiky/issues/27#issuecomment-813680742, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAABCITYD54OKR4Y6L3PB3DTHIY4BANCNFSM42NRGAXA.

CAUTION: This email was sent from outside of the organization @.). Do not click links or open attachments unless you recognize the sender and know the content is safe. If you have any questions, please contact @*.**@*.***>.