popgenmethods / SINGER

Sampling and inference of genealogies with recombination
MIT License
23 stars 4 forks source link

Interpreting parallel SINGER Output: Time Units and Mapping Nodes to VCF IDs #7

Closed santiago1234 closed 5 months ago

santiago1234 commented 5 months ago

Hi @YunDeng98 ,

I have successfully executed parallel SINGER and have a few questions regarding the interpretation and further processing of the output files.

I used the following command.

parallel_singer -vcf {chr_22.vcf} \
           -Ne 1e4 \
           -m 1.2e-8 \
           -output results/trees/mex_chr22 \
           -n 1000 \
           -thin 20

Question:

Interpreting Tree Sequence Summary:

Upon inspecting a sample tree sequence file (e.g, chr22_36.trees) with tskit, the summary is as follows:

import tskit
ts = tskit.load('results/trees/chr22_36.trees')
ts
Tree Sequence Summary
---------------------
Trees:           197,329
Sequence Length: 51,000,000.0
Time Units:      Unknown
Sample Nodes:    474
Total Size:      44.3 MiB

Table Details
-------------
Table       | Rows   | Size     | Has Metadata
------------|--------|----------|-------------
Edges       | 782,759| 23.9 MiB | No
Individuals | 0      | 24 Bytes | No
Migrations  | 0      | 8 Bytes  | No
Mutations   | 154,161| 5.4 MiB  | No
Nodes       | 211,505| 5.6 MiB  | No
Populations | 0      | 8 Bytes  | No
Provenances | 0      | 16 Bytes | No
Sites       | 141,527| 3.4 MiB  | No

Thanks again for your help :)

YunDeng98 commented 5 months ago

Hi @santiago1234, to your questions: (1) the units of time are in generations (2) The nodes appear in the same order as in the vcf file, that is, leaf node 0 and 1 are the first individual in the vcf, and leaf node 2 and 3 are the second individual, etc.

santiago1234 commented 5 months ago

thanks!