lczech / grenedalf

Toolkit for Population Genetic Statistics from Pool-Sequenced Samples, e.g., in Evolve and Resequence experiments
GNU General Public License v3.0
33 stars 2 forks source link

Grendalf not outputting when sites are missing in the Mpileup file #26

Closed sholtz1 closed 1 month ago

sholtz1 commented 2 months ago

I am currently using Grendalf v 0.5.0 with some pooled sequencing data and was noticing that in my outputs the total.missing column is always 0 even when information is missing from my mpileup files. I have attached a screenshot of my mpileup file and of the output from grendalf diversity to illustrate this. download download (1)

lczech commented 2 months ago

Hi Spencer @sholtz1,

thanks for your patience, was a bit swamped... Also thanks for posting this here again - it is better than email, as now, others can benefit from this here as well :-)

The missing column only lists data that has been explicitly marked as missing in file formats that support that. In file formats such as pileup, absent positions are just completely ignored by grenedalf, and do not show up in the statistics. This is because that's faster, and as they do not change the statistics in any way, I figured that makes sense. I will change the description, that is indeed a bit confusing, thanks for the hint!

I have now also added a new option --make-gapless that will be available soon (on the dev now already, and in the next release). This fills in all gaps in the input (where data is absent) with dummy entries for the internal processing that are then marked as "missing", and hence will be counted towards the statistics. This can also be used to produce output for all those positions when using single windows.

As for the window averaging (a question from your previous email): it shouldn't matter, data that is just absent, or explicitly marked as missing should be treated the same. I have noted however that there was a slight exception to this for the available-loci window average policy, which counted loci that might be marked as missing in one or more of the samples as still being a position "with data". I've changed this now, so that it doesn't matter if there was no data at a position, or if it was marked as missing. Thanks for bringing this to my attention!

Cheers and so long Lucas

lczech commented 1 month ago

Hi Spencer @sholtz1,

I just released grenedalf v0.6.0 which implements all of the above features and fixes. I think that concludes your issue then? If not, feel free to re-open or open a new one!

Cheers Lucas