lh3 / psmc

Implementation of the Pairwise Sequentially Markovian Coalescent (PSMC) model
Other
152 stars 60 forks source link

Sudden increases in Ne #37

Open plubbe opened 3 years ago

plubbe commented 3 years ago

Hello again team. I am generating PSMC plots for multiple species, and I keep seeing, across most of them, sudden increases in Ne that shoot into >1e06. Occasionally the plots will also decrease afterwards to zero, or a far smaller Ne value. Any ideas what is causing this? It must be an artefact of my process since it happens pretty consistently across multiple species mapped to various references. See for example: on the left, the plot as generated by psmc.plot; on the right, a re-scaled plot. They are generated with the same inputs, the only difference is the "zoom" to smaller values of Ne. Any advice on this is greatly appreciated; it's difficult to interpret any plots I generate with these trends (seems to be most of them at this point!)

Screen Shot 2021-10-05 at 11 55 09 AM
Giov12 commented 2 years ago

@plubbe were you able to figure out the cause of your issue? I am seeing similar trends in the data I am working on. Thank you.

plubbe commented 2 years ago

Hi @Giov12 ! I figured out this artifact was mostly due to the sequencing set-up I used; I had different insert sizes for one individual sample that I'd sequenced and mapped, and I'd combined those .bams with Samtools merge. Because the sequences were split both over multiple lanes and multiple insert sizes though, there was a lot of different read group information in the resulting merged .bam. As a result, during the variant-calling step, I was inadvertently creating multi-individual vcfs; PSMC is designed to work on seqs from a single individual, and my reads were flagged as belonging to multiple individuals due to those pesky read groups. I ended up using picard AddOrReplaceReadGroups to modify them - and voila, the problem sorted itself. I do have one or two files that still have the Ne spikes, but they're further to the left of the graph (more recent), and instead of going to (essentially) infinity, they just spike to (relatively) higher values (e.g., 50).

Hopefully you've got a similarly simple fix to your issue!

Giov12 commented 2 years ago

@plubbe Thank you for your quick response! I will see if this fixes my issues. It is much appreciated.