mfumagalli / ngsTools

Programs to analyse NGS data for population genetics purposes
GNU General Public License v3.0
169 stars 64 forks source link

How to interpret the output from plotQC.R #20

Closed albinap closed 5 years ago

albinap commented 5 years ago

I want to begin with thanking you for the very helpful angsd tutorial and scripts, they have been so useful.

I have run your code plotQC.R successfully on my samples but I am struggling with how to interpret the pdf output and how I can use it to make decisions about the settings for the -setMinDepth and -setMaxDepth flags as that is not really explained in the tutorial.

I have looked far and wide but have not found good explanations behind how people are choosing values for these flags and the angsd documentation is not very helpful. There it says -setMinDepth [int]: Discard site if total sequencing depth (all individuals added together) is below [int]. Requires -doCounts -setMaxDepth [int]: Discard site if total sequencing depth (all individuals added together) is above [int] -doCounts But gives no info on how to figure out what settings might make sense.

Choi et al 2019 say: -setMinDepth to be one-third the average genome-wide coverage, and–setMaxDepth to be 2.5 times the average genome-wide coverage.

I am working with WGS ancient and modern with a mammal that has a good reference genome. My sample coverage is 3-7x and I am running angsd only on chr no x, y or scaffolds.

I realize this may not strictly be catagorized as an issue but thought I would try. Example of plot output for chr 3 with 28 ancient samples

image

mfumagalli commented 5 years ago

Dear Albina,

seeing those thresholds depends on your data. Those suggestions found in Choi et al 2019 are reasonable for decent coverage data. In your case, you may want to be less strict and remove sites with are clear outliers in the distribution of global depth. You can infer those values from the plot you generated (the one in the middle).

Best

Matteo


From: Albína Hulda Pálsdóttir notifications@github.com Sent: 21 May 2019 15:55 To: mfumagalli/ngsTools Cc: Subscribed Subject: [mfumagalli/ngsTools] How to interpret the output from plotQC.R (#20)

I want to begin with thanking you for the very helpful angsd tutorial and scripts, they have been so useful.

I have run your code plotQC.R successfully on my samples but I am struggling with how to interpret the pdf output and how I can use it to make decisions about the settings for the -setMinDepth and -setMaxDepth flags as that is not really explained in the tutorial.

I have looked far and wide but have not found good explanations behind how people are choosing values for these flags and the angsd documentation is not very helpful. There it says -setMinDepth [int]: Discard site if total sequencing depth (all individuals added together) is below [int]. Requires -doCounts -setMaxDepth [int]: Discard site if total sequencing depth (all individuals added together) is above [int] -doCounts But gives no info on how to figure out what settings might make sense.

Choi et al 2019 say: -setMinDepth to be one-third the average genome-wide coverage, and–setMaxDepth to be 2.5 times the average genome-wide coverage.

I am working with WGS ancient and modern with a mammal that has a good reference genome. My sample coverage is 3-7x and I am running angsd only on chr no x, y or scaffolds.

I realize this may not strictly be catagorized as an issue but thought I would try. Example of plot output for chr 3 with 28 ancient samples

[image]https://user-images.githubusercontent.com/25434560/58106929-7524a900-7bd8-11e9-8e6f-dfc984457c04.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/mfumagalli/ngsTools/issues/20?email_source=notifications&email_token=AAQ26COU5EMK72ABGCCEJQDPWQEPNA5CNFSM4HOL3W5KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GU7TPSA, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAQ26CNLFF33WZW3NHAPJRLPWQEPNANCNFSM4HOL3W5A.