open2c / open2c_examples

16 stars 9 forks source link

Add examples for showing effect of low coverage and finding TADs #30

Closed Yaoyx closed 7 months ago

review-notebook-app[bot] commented 8 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

gfudenberg commented on 2024-02-14T00:29:45Z ----------------------------------------------------------------

Using adjacent boundaries to create a table of TADs


review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

gfudenberg commented on 2024-02-14T00:29:45Z ----------------------------------------------------------------

Calling TADs from Hi-C data poses a challenge, in part because domain structures vary greatly in their size, intensity, and can be nested. The number of called TADs varies substantially from tool to tool, and can depend on tool-specific parameters (Forcato, 2017). Below, we show an example of how adjacent boundaries calculated with cooltools can specify a set of intervals that can be analyzed as TADs.


review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

gfudenberg commented on 2024-02-14T00:29:46Z ----------------------------------------------------------------

Line #10.        chrom += strong_boundaries.chrom.to_list()

could we do this in a pandas-native fashion rather than converting to list? e.g. via concat


review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

gfudenberg commented on 2024-02-14T00:29:47Z ----------------------------------------------------------------

should show Hi-C data, either just below the shadow TADs or instead just highighting positions of TADs as light black lines over the Hi-C data


review-notebook-app[bot] commented 8 months ago

View / edit / reply to this conversation on ReviewNB

gfudenberg commented on 2024-02-14T00:29:48Z ----------------------------------------------------------------

Line #5.        contact_matrix[int(row['start']/resolution):int(row['end']/resolution), int(row['start']/resolution):int(row['end']/resolution)] = 1

# can add a comment about visualizing the first 10 inter-boundary intervals vs. the data


gfudenberg commented 8 months ago

c/o @Phlya : for insulation_and_boundaries.ipynb

We should include a cutoff on TAD size & rename process_chromosome to something like convert_boundaries_to_TADs() . He also has an alternate implementation in quaich that he can link here.

gfudenberg commented 8 months ago

also c/o @Phlya : for contacts_vs_distance.ipynb.

number of counts reported should match those in the curve (e.g. calculate just the number of contacts in the arm), and should make sure the aggregation is over the same number of regions in both raw & smoothed.