grenaud / ROHan

Joint inference of heterozygosity rates and runs of homozygosity
http://grenaud.github.io/ROHan/
GNU General Public License v3.0
11 stars 4 forks source link

ROHan output summary #5

Closed leovallini closed 4 years ago

leovallini commented 4 years ago

Dear ROHan developers,

I am trying your software but I am having some issues with the interpretation of the summary.txt output file of ROHan: everything seems fine for the first value of each row(obtained using the point estimate for heterozygosity), but the second and third value are not always inserted in the same order. For the first row (gl. het. rate) the second value is the min and the third the max, but as you move to the lines below this order does not seem to be always respected: for instance for "Segments in ROH" the second value is obtained using max estimate for het and the third using the min one (I checked this looking at the *.hmmrohl files). This swapping in the results order makes extremely tedious extracting the data for further analysis/plotting and for other statistics (segments unclassified for instance) I was not able to understand looking at other output files which is the value obtained from the min and max estimates of het. Any chance the values can be ordered consistently?

On top of that I noticed that the percentages do not sum up to 100 (once again, for the values obtained using the mid estimate of het. everything seems fine and the problem is limited to the other two).

(Below is one example of what I obtained: ROHan was run on chr1 of an aDNA sample using the options --rohmu 2e-5 and --size 100000, everything else was default)

Github version: 581b6787fafbc8201605f3c0eae2b7874b74aa31 Global heterozygosity rate: 0.000833908 0.000653848 0.000950505 Heterozygosity in ROH/nonROH: 0.000693905 0.000496544 0.000914446 Segments unclassified : 35300000 (24000000,35300000) Segments unclassified (%): 14.1596 (9.62696,14.1596) Segments in ROH : 32100000 (3800000,53100000) Segments in ROH(%) : 15 (1.75358,31.7204) Segments in non-ROH : 181900000 (163600000,221500000) Segments in non-ROH (%) : 85 (75.4961,100) Avg. length of ROH : 327551 (272308,760000)

Thanks

Leo

grenaud commented 4 years ago

Dear Leo, Thank you for your interest in our work! I will swap the values to make sure the min is always the min etc. Apologies about the tedious parsing! while I am meddling with the code, I was informed by one of my colleagues that "Global heterozygosity rate:" and "Heterozygosity in ROH/nonROH:" is not really meaningful, would you agree?

The % of segments should not necessarily sum up to one, segments can be: 1) in ROH 2) not in ROH 3) not sure

I use a 90% confidence on the posterior decoding, should I comment this in the README?

Again, thank you for the feedback :-)

leovallini commented 4 years ago

Thanks for the quick reply!

I think the "global heterozygosity rate" can be of some use: for instance if comparing the value among different chromosomes (even if you can always get it from the *.hEst.gz file, having it in the summary is way faster). With regard to the "Heterozygosity in ROH/nonROH" I agree with your colleagues.

The answer to percentages not summing up to 100 can be helpful for people(like me) who didn't read the paper carefully enough so mentioning it in the FAQ section of the README might be a good idea, either that or leave this issue when you close it so that people having the same doubt can look it up.

grenaud commented 4 years ago

sounds good, I am trying to wrap up a paper this week, do you mind if I address this next Monday?

On Tue, Dec 10, 2019 at 6:33 PM leovallini notifications@github.com wrote:

Thanks for the quick reply!

I think the "global heterozygosity rate" can be of some use: for instance if comparing the value among different chromosomes (even if you can always get it from the *.hEst.gz file, having it in the summary is way faster). With regard to the "Heterozygosity in ROH/nonROH" I agree with your colleagues.

The answer to percentages not summing up to 100 can be helpful for people(like me) who didn't read the paper carefully enough so mentioning it in the FAQ section of the README might be a good idea, either that or leave this issue when you close it so that people having the same doubt can look it up.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grenaud/ROHan/issues/5?email_source=notifications&email_token=AAQRNI3W6N5VEB7E6GRIJL3QX7HGDA5CNFSM4JY5Z7M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGQCXFA#issuecomment-564145044, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQRNI76DZIDKLIUULYII6TQX7HGDANCNFSM4JY5Z7MQ .

leovallini commented 4 years ago

perfect, good luck with your paper.

grenaud commented 4 years ago

Dear Leo, Sorry again for the inability to address this in time, I will try later this week.

Gabriel

On Wed, Dec 11, 2019 at 2:27 PM leovallini notifications@github.com wrote:

perfect, good luck with your paper.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grenaud/ROHan/issues/5?email_source=notifications&email_token=AAQRNI7H6775ZESZXUDO5YLQYDTE3A5CNFSM4JY5Z7M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGTC3AQ#issuecomment-564538754, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQRNI4C3XJROIGWR4AB2MLQYDTE3ANCNFSM4JY5Z7MQ .

grenaud commented 4 years ago

First of all my sincere apologies for the delay in replying for this. I had a class to teach this week and spend the entire Christmas holidays making slides. I have since:

If you find these changes acceptable please feel free to close this issue.

leovallini commented 4 years ago

Dear Gabriel, Thanks for the update, it seems exactly what I was hoping for. Leonardo