s-andrews / SeqMonk

SeqMonk NGS visualisation and analysis tool
GNU General Public License v2.0
47 stars 9 forks source link

Filtering data based on statistical test #281

Open lahyusof opened 1 day ago

lahyusof commented 1 day ago

Hello there,

I’m currently analyzing chloroplast methylation of five different rice samples and am just generally scratching my head with filtering. I’ve merged the three replicates into one representative track for each rice I’m analyzing. I’ve tried running ‘Filter by Statistical Test’ for replicate data (i.e. t-test/ANOVA and Logistic Regression) and am not able to run the test at p<0.05 or even 0.1. That leaves me with analyzing unreplicated data and I’m currently unsure of what I should do now and would like to ask for recommendations. Is it better to filter statistics based on continuous data or proportions? Chi-square, Windowed Replicates? Are there other things I should consider during analysis?

I would really appreciate a reply for guidance. I’m so new to SeqMonk and methylation analysis in general that I don’t even know what to start analyzing, what parameters to set (eg window size), when and what data to construct my graphs on and just generally, what is the best way to go about this. Thank you

jonathandmoore commented 1 day ago

One thing I thought I would mention if you are new to methylation - most land plants don't have methylation in chloroplast. We use any measured methylation in chloroplast as an estimate of error, since we expect it to be zero

Hope this is helpful.

On Fri, 11 Oct 2024, 17:49 lahyusof, @.***> wrote:

Hello there,

I’m currently analyzing chloroplast methylation of five different rice samples and am just generally scratching my head with filtering. I’ve merged the three replicates into one representative track for each rice I’m analyzing. I’ve tried running ‘Filter by Statistical Test’ for replicate data (i.e. t-test/ANOVA and Logistic Regression) and am not able to run the test at p<0.05 or even 0.1. That leaves me with analyzing unreplicated data and I’m currently unsure of what I should do now and would like to ask for recommendations. Is it better to filter statistics based on continuous data or proportions? Chi-square, Windowed Replicates? Are there other things I should consider during analysis?

I would really appreciate a reply for guidance. I’m so new to SeqMonk and methylation analysis in general that I don’t even know what to start analyzing, what parameters to set (eg window size), when and what data to construct my graphs on and just generally, what is the best way to go about this. Thank you

— Reply to this email directly, view it on GitHub https://github.com/s-andrews/SeqMonk/issues/281, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4F3KRJJL5DHKNCVWDAKOTZ276SPAVCNFSM6AAAAABPZI7SYCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4DCNZZHE3DCNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

lahyusof commented 1 day ago

It's true that the topic of chloroplast methylation is controversial. However, my senior published a paper reporting that chloroplasts in rice progressively get more methylated as the plant ages. Hence why my project is trying to further expand on their work.

Link: https://link.springer.com/article/10.1007/s11103-019-00841-x