Open nvaulin opened 3 months ago
Thank you for reporting it Nikita! This is an important issue, at the very least regions should not be silently skipped for no clear reason... It would be more useful if you could provide some reproducible example even with some random public data... But we'll try to look into it when we find the time.
Hi, I bumped into some unclear behavior of coolpup.py. Finally, I managed to overcome the problem, but maybe this issue might be helpful for other users and might help to improve this awesome tool.
TLDR:
AssertionError: domain_score not in dict2
frompu.pileupsWithControl
means there are regions you want to pile-up may be intersected or close to bad bins.Story
I tried to get rescaled pileup of TADs with calculation of TADs scores (strengths). I've done everything according to the Distribution of TAD strength scores part of the tutorial. For your example HFF_MicroC data everything worked fine, but for my data the behavior was strange:
pu.pileupsWithControl
raisesAssertionError: domain_score not in dict2
form theaccumulate_values
functionI found out that there were too many NaNs in the expected df. (Yes, I should pay more attention to my data, but back to the coolpup.py). I recalculated expected without respect to the centromeres and the
AssertionError
disappeared.But then, after pileup ran without any errors, I found that length of
pup.loc[0].domain_score
is less thelen(tads)
. I tried to record all the TADs processed by theadd_domain_score
and found that some of the TADs are missed:So, these unprocessed TADs were skipped silently, without any warning, When I tried to run
pu.pileupsWithControl
specifically on the unprocessed TADs, it gave meAssertionError: domain_score not in dict2
again.These unprocessed TADs were first or last TADs on their chromosomes and, apparently, are intersected with Hi-C map black stripes ("bad bins"). For their snippets data was containing only NaNs. After I performed more strict TADs filtering - pileup ended up fine for my data.
So the conclusions for the users:
Conclusions for the coolpuppy team
AssertionError: domain_score not in dict2
error, which is very far away from the real problempostprocess_func
).Again, in my case it just solved by more strict filtering, however probably there may be cases when such TADs skippings may spoil the results.
I don't know weather this issue might be useful, especially I am not able to provide my data right now. You can close it if you want. If you will be discovering this case, then I can assist you.
System
Sincerely, Nikita