Closed Brahmandam-Gayatri closed 3 months ago
Please provide more details - what have you tried, what were the results?
I tried using cooltools, but not sure how to proceed with coolpuppy. Here is my code:
clr1_10kb = cooler.Cooler('/path/to/file/sample1.mcool::/resolutions/10000')
sacCer3_chromsizes = bioframe.fetch_chromsizes('sacCer3')
sacCer3_cens = pd.read_csv('/path/to/file/centromere.txt', sep='\t')
Read telomere coordinates:
tel_yeast = pd.read_table('/path/to/file/telomere_left_yeast.bed', sep='\t').query(f'chrom in {clr1_10kb.chromnames}')
tel_yeast['mid'] = (tel_yeast.end+tel_yeast.start)//2
tel_yeast.head()
chrom start end mid
0 chrIX 1 7784 3892
1 chrV 1 6473 3237
2 chrXIII 1 6344 3172
3 chrXVI 1 7223 3612
4 chrIII 1 1098 549
expected_yeast = cooltools.expected_cis(clr1_10kb, nproc=2, chunksize=1_000)
expected_yeast
region1 region2 dist n_valid count.sum balanced.sum count.avg balanced.avg balanced.avg.smoothed balanced.avg.smoothed.agg
0 chrM chrM 0 9 NaN NaN NaN NaN NaN NaN
1 chrM chrM 1 8 NaN NaN NaN NaN 0.000497 0.000639
2 chrM chrM 2 7 48293.0 0.372054 6899.000000 0.053151 0.052541 0.056308
3 chrM chrM 3 6 36136.0 0.321482 6022.666667 0.053580 0.048051 0.037760
4 chrM chrM 4 5 22451.0 0.167468 4490.200000 0.033494 0.037567 0.026177
... ... ... ... ... ... ... ... ... ... ...
1220 chrIV chrIV 149 4 19.0 0.003011 4.750000 0.000753 0.000521 0.000535
1221 chrIV chrIV 150 3 3.0 0.000386 1.000000 0.000129 0.000522 0.000536
1222 chrIV chrIV 151 2 4.0 0.000706 2.000000 0.000353 0.000524 0.000537
1223 chrIV chrIV 152 1 121.0 0.019644 121.000000 0.019644 0.000525 0.000538
1224 chrIV chrIV 153 0 0.0 0.000000 NaN NaN 0.000526 0.000538
1225 rows × 10 columns
paired_sites = bioframe.pair_by_distance(tel_yeast, min_sep=200, max_sep=1000, suffixes=('1', '2'))
paired_sites.loc[:, 'mid1'] = (paired_sites['start1'] + paired_sites['end1'])//2
paired_sites
chrom1 start1 end1 mid1 chrom2 start2 end2 mid2 **#The paired_sites dataframe is empty**
I tried doing this to generate aggregate plot for off-diagonal pileup, as documented here: https://cooltools.readthedocs.io/en/latest/notebooks/pileup_CTCF.html
But as I could not get the plot, I went through issues of cooltools so as to raise a new issue (https://github.com/open2c/cooltools/issues/491) and was redirected to coolpuppy from https://github.com/open2c/cooltools/issues/491#issuecomment-1920075175
Hence, I went through the docs of https://coolpuppy.readthedocs.io/en/latest/Examples/Walkthrough_API.html#trans-inter-chromosomal-pileups, but I am still not sure how to plot for telomeric regions.
Well, since your regions are on the edge of the chromosome, you need to basically just pileup all chromosome starts. So You can create a bed file with the first e.g. 100 kbp regions of each chromosome chrom start end chrIX 1 100000 ... etc.
Then use coolpup.py pileup with --trans and without rescaling.
Apologies if I sound naive, but how to create the bed file?
It's a simple tab-delimited file with three columns (chrom, start, end). You can use pandas, a text editor, or a spreadsheet editor.
Or you can keep using the python API, then you can modify the tel_yeast dataframe
So I need to just save the telomere coordinates as a bed file and input that along with my mcool file to coolpup.py pileup to get the plot using command line?
This is how the sacCer3_cens data looks like:
chrom start end mid
0 chrI 151465 151582 151524
1 chrII 238207 238323 238265
2 chrIII 114385 114501 114443
3 chrIV 449711 449821 449766
4 chrV 151987 152104 152046
5 chrVI 148510 148627 148569
6 chrVII 496920 497038 496979
7 chrVIII 105586 105703 105645
8 chrIX 355629 355745 355687
9 chrX 436307 436425 436366
10 chrXI 440129 440246 440188
11 chrXII 150828 150947 150888
12 chrXIII 268031 268149 268090
13 chrXIV 628758 628875 628817
14 chrXV 326584 326702 326643
15 chrXVI 555957 556073 556015
Sort of - actually there is a trick... you have to provide a file that is shifted by a certain distance from the end of the chromosome (e.g. 100 kb), and use the same distance as the --flank argument. So, small correction: It should look like this: chrom start end chrIX 90000 110000 ...
And use coolpup.py {cool_file} {shifted_tels.bed} --flank 100000 --trans --expected {expected_trans_file} -o {pileup.clp}
And then plotpup.py should plot the result
Should I input the same coordinates for all chromosomes or change according to the chromosome? For eg., chrII and chrIII have different coordinates, so would it be fine if I specify as 90000 and 110000 for both?
chrII 238207 238323
chrIII 114385 114501
Before you showed coordinates that were all very small and close to 0, why are they much bigger now?
These are centromeric coordinates.
chrII 238207 238323
chrIII 114385 114501
I showed the left arm telomeric coordinates at the start of my post
chrIII 1 1098
chrVI 1 5530
chrXII 1 12085
chrVIII 1 5505
chrXV 1 847
chrII 1 6608
But you want to pileup telomeres?
Yes. I want the pileup for telomeres of both left and right arms of the chromosomes. But to plot, I am only doing with left arm first.
Yeah then just pileup around a fixed distance shifted from 0, e.g. 100 kb or smth... I'm not familiar with yeast data, not sure what would be the best distance. Then you can pileup the right ones the same way, shift from the end of each chromosome by a fixed distance
I am getting the following error:
coolpup.py /path/to/file/sample1.mcool::/resolutions/10000 shifted_tels.bed --flank 100000 --trans --expected expected_trans.tsv -o pileup.clp
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
Traceback (most recent call last):
File "/home/catg1/.local/lib/python3.10/site-packages/cooltools/lib/checks.py", line 298, in _is_compatible_trans_expected
_ = _is_expected(
File "/home/catg1/.local/lib/python3.10/site-packages/cooltools/lib/checks.py", line 152, in _is_expected
raise e
File "/home/catg1/.local/lib/python3.10/site-packages/cooltools/lib/checks.py", line 125, in _is_expected
raise ValueError(f"Values in {grouping_columns} columns must be unique")
ValueError: Values in ['region1', 'region2'] columns must be unique
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/catg1/.local/lib/python3.10/site-packages/cooltools/lib/io.py", line 327, in read_expected_from_file
_ = is_valid_expected(
File "/home/catg1/.local/lib/python3.10/site-packages/cooltools/lib/checks.py", line 391, in is_valid_expected
return _is_compatible_trans_expected(
File "/home/catg1/.local/lib/python3.10/site-packages/cooltools/lib/checks.py", line 340, in _is_compatible_trans_expected
raise e
File "/home/catg1/.local/lib/python3.10/site-packages/cooltools/lib/checks.py", line 305, in _is_compatible_trans_expected
raise ValueError("expected_df does not look like trans-expected") from e
ValueError: expected_df does not look like trans-expected
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/catg1/.local/bin/coolpup.py", line 8, in <module>
sys.exit(main())
File "/home/catg1/.local/lib/python3.10/site-packages/coolpuppy/CLI.py", line 493, in main
expected = io.read_expected_from_file(
File "/home/catg1/.local/lib/python3.10/site-packages/cooltools/lib/io.py", line 336, in read_expected_from_file
raise ValueError(
ValueError: Input expected file does not match the schema
Expected must be tab-separated file with a header
The file expected_trans.tsv contains the following:
<!DOCTYPE html>
<html>
<head>
<title></title>
<meta name="generator" content="LibreOffice 7.3.7.2 (Linux)"/>
<style type="text/css">
body,div,table,thead,tbody,tfoot,tr,th,td,p { font-family:"Liberation Sans"; font-size:x-small }
a.comment-indicator:hover + comment { background:#ffd; position:absolute; display:block; border:1px solid black; padding:0.5em; }
a.comment-indicator { background:red; display:inline-block; border:1px solid black; width:0.5em; height:0.5em; }
comment { display:none; }
</style>
</head>
<body>
region1 | region2 | dist | n_valid | count.sum | balanced.sum | count.avg | balanced.avg | balanced.avg.smoothed | balanced.avg.smoothed.agg
-- | -- | -- | -- | -- | -- | -- | -- | -- | --
chrM | chrM | 0 | 9 | | | | | |
chrM | chrM | 1 | 8 | | | | | 0.000496669227937875 | 0.00063942972823703
chrM | chrM | 2 | 7 | 48293 | 0.372053637479411 | 6899 | 0.0531505196399159 | 0.0525410493153696 | 0.0563079144204046
chrM | chrM | 3 | 6 | 36136 | 0.321481934581842 | 6022.66666666667 | 0.053580322430307 | 0.0480513547631135 | 0.0377598171583264
chrM | chrM | 4 | 5 | 22451 | 0.167467893069907 | 4490.2 | 0.0334935786139813 | 0.0375673781338824 | 0.0261766749716914
chrM | chrM | 5 | 4 | 14185 | 0.11236548714211 | 3546.25 | 0.0280913717855276 | 0.0360963830657584 | 0.018969640466925
chrM | chrM | 6 | 3 | 11203 | 0.0887893943518971 | 3734.33333333333 | 0.0295964647839657 | 0.0491267465142774 | 0.0142832725033798
chrM | chrM | 7 | 2 | 10225 | 0.0805744713985754 | 5112.5 | 0.0402872356992877 | 0.0752928403921177 | 0.0110845475380622
chrM | chrM | 8 | 1 | 44000 | 0.377630196055131 | 44000 | 0.377630196055131 | 0.10876732008745 | 0.00880629387117672
chrIX | chrIX | 0 | 42 | | | | | |
chrIX | chrIX | 1 | 41 | | | | | 0.000587566452703396 | 0.00063942972823703
chrIX | chrIX | 2 | 40 | 31083 | 2.25873209324364 | 777.075 | 0.056468302331091 | 0.0522900364541279 | 0.0563079144204046
chrIX | chrIX | 3 | 39 | 19380 | 1.41740182842212 | 496.923076923077 | 0.0363436366262083 | 0.0346470550700121 | 0.0377598171583264
chrIX | chrIX | 4 | 38 | 12546 | 0.909496302874006 | 330.157894736842 | 0.0239341132335265 | 0.0234776546783782 | 0.0261766749716914
chrIX | chrIX | 5 | 37 | 8769 | 0.636014010291257 | 237 | 0.0171895678457097 | 0.0167191559864385 | 0.018969640466925
chrIX | chrIX | 6 | 36 | 6230 | 0.453913668018458 | 173.055555555556 | 0.0126087130005127 | 0.0123988360898993 | 0.0142832725033798
chrIX | chrIX | 7 | 35 | 4577 | 0.330308018504265 | 130.771428571429 | 0.00943737195726473 | 0.00947597931387518 | 0.0110845475380622
chrIX | chrIX | 8 | 34 | 3458 | 0.250111089468336 | 101.705882352941 | 0.00735620851377459 | 0.00743528836833192 | 0.00880629387117672
chrIX | chrIX | 9 | 33 | 2686 | 0.196342671245793 | 81.3939393939394 | 0.00594977791653917 | 0.00597241658428364 | 0.00712865424620211
chrIX | chrIX | 10 | 32 | 2079 | 0.15089710731848 | 64.96875 | 0.0047155346037025 | 0.00489941668906075 | 0.00586938932489326
chrIX | chrIX | 11 | 31 | 1683 | 0.119554717638327 | 54.2903225806452 | 0.00385660379478475 | 0.00409746125172262 | 0.00491551082671522
chrIX | chrIX | 12 | 30 | 1428 | 0.101074284498914 | 47.6 | 0.00336914281663046 | 0.00348811011483907 | 0.00418855820128917
chrIX | chrIX | 13 | 29 | 1142 | 0.0810218250968101 | 39.3793103448276 | 0.00279385603782104 | 0.00301770972112611 | 0.00363010272398591
chrIX | chrIX | 14 | 28 | 1021 | 0.0725704580713719 | 36.4642857142857 | 0.00259180207397757 | 0.00264889881396591 | 0.00319570812585238
chrIX | chrIX | 15 | 27 | 825 | 0.0586249749229456 | 30.5555555555556 | 0.0021712953675165 | 0.00235538487639591 | 0.00285180404455236
chrIX | chrIX | 16 | 26 | 687 | 0.0502465934841103 | 26.4230769230769 | 0.0019325612878504 | 0.00211854523421107 | 0.00257367476271558
chrIX | chrIX | 17 | 25 | 623 | 0.045339101246096 | 24.92 | 0.00181356404984384 | 0.00192505750135856 | 0.00234370630074062
chrIX | chrIX | 18 | 24 | 526 | 0.0381323243496206 | 21.9166666666667 | 0.00158884684790086 | 0.00176522448628715 | 0.0021497042831649
chrIX | chrIX | 19 | 23 | 507 | 0.0371118989933197 | 22.0434782608696 | 0.00161356082579651 | 0.00163192587023838 | 0.00198345183312927
chrIX | chrIX | 20 | 22 | 425 | 0.0310489700985334 | 19.3181818181818 | 0.00141131682266061 | 0.00151987608905784 | 0.0018394528675716
chrIX | chrIX | 21 | 21 | 382 | 0.0283030714115839 | 18.1904761904762 | 0.00134776530531352 | 0.00142505626400995 | 0.00171391190010117
chrIX | chrIX | 22 | 20 | 303 | 0.0215487693655638 | 15.15 | 0.00107743846827819 | 0.00134439350944602 | 0.00160408664679366
chrIX | chrIX | 23 | 19 | 294 | 0.0215262241712543 | 15.4736842105263 | 0.00113295916690812 | 0.00127548704655054 | 0.00150783719102629
chrIX | chrIX | 24 | 18 | 271 | 0.0198055954959348 | 15.0555555555556 | 0.00110031086088527 | 0.00121643777837355 | 0.00142338597723856
chrIX | chrIX | 25 | 17 | 240 | 0.0181570700365712 | 14.1176470588235 | 0.00106806294332772 | 0.00116570557816471 | 0.00134918031338564
</body>
</html>
And the shifted_tels.bed has the following coordinates:
chrom start end
chrIX 9000 110000
chrV 9000 110000
chrXIII 9000 110000
chrXVI 9000 110000
chrIII 9000 110000
chrVI 9000 110000
chrXII 9000 110000
chrVIII 9000 110000
chrXV 9000 110000
chrII 9000 110000
chrXIV 9000 110000
chrI 9000 110000
chrVII 9000 110000
chrX 9000 110000
chrXI 9000 110000
chrIV 9000 110000
I computed the trans expected and reran the command. But still the same error:
This is the trans file output:
region1 region2 n_valid count.sum balanced.sum count.avg balanced.avg
0 chrM chrIX 378 9299.0 0.206373 24.600529 0.000546
1 chrM chrV 504 11856.0 0.281875 23.523810 0.000559
2 chrM chrXIII 783 19297.0 0.440791 24.644955 0.000563
3 chrM chrXVI 801 18673.0 0.444724 23.312110 0.000555
4 chrM chrIII 288 6642.0 0.147612 23.062500 0.000513
... ... ... ... ... ... ... ...
131 chrVII chrXI 7102 45664.0 3.478307 6.429738 0.000490
132 chrVII chrIV 15688 97194.0 7.622758 6.195436 0.000486
133 chrX chrXI 4757 33967.0 2.578139 7.140425 0.000542
134 chrX chrIV 10508 62620.0 5.130233 5.959269 0.000488
135 chrXI chrIV 9916 57778.0 4.469080 5.826745 0.000451
136 rows × 7 columns
Error:
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
Traceback (most recent call last):
File "/home/catg1/.local/lib/python3.10/site-packages/cooltools/lib/checks.py", line 298, in _is_compatible_trans_expected
_ = _is_expected(
File "/home/catg1/.local/lib/python3.10/site-packages/cooltools/lib/checks.py", line 152, in _is_expected
raise e
File "/home/catg1/.local/lib/python3.10/site-packages/cooltools/lib/checks.py", line 125, in _is_expected
raise ValueError(f"Values in {grouping_columns} columns must be unique")
ValueError: Values in ['region1', 'region2'] columns must be unique
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/catg1/.local/lib/python3.10/site-packages/cooltools/lib/io.py", line 327, in read_expected_from_file
_ = is_valid_expected(
File "/home/catg1/.local/lib/python3.10/site-packages/cooltools/lib/checks.py", line 391, in is_valid_expected
return _is_compatible_trans_expected(
File "/home/catg1/.local/lib/python3.10/site-packages/cooltools/lib/checks.py", line 340, in _is_compatible_trans_expected
raise e
File "/home/catg1/.local/lib/python3.10/site-packages/cooltools/lib/checks.py", line 305, in _is_compatible_trans_expected
raise ValueError("expected_df does not look like trans-expected") from e
ValueError: expected_df does not look like trans-expected
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/catg1/.local/bin/coolpup.py", line 8, in <module>
sys.exit(main())
File "/home/catg1/.local/lib/python3.10/site-packages/coolpuppy/CLI.py", line 493, in main
expected = io.read_expected_from_file(
File "/home/catg1/.local/lib/python3.10/site-packages/cooltools/lib/io.py", line 336, in read_expected_from_file
raise ValueError(
ValueError: Input expected file does not match the schema
Expected must be tab-separated file with a header
My mistake, I reran the command with proper file and it worked, but when I ran plotpup.py, it says there are nans or zero values present:
coolpup.py /path/to/file/sample1.mcool::/resolutions/10000 shifted_tels.bed --flank 100000 --trans --expected expected_trans.txt -o pileup.clp
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/home/catg1/.local/lib/python3.10/site-packages/coolpuppy/coolpup.py:2156: UserWarning: Ignoring maxdist when using trans
CC = CoordCreator(
**INFO:coolpuppy:Total number of piled up windows: 0
INFO:coolpuppy:Saved output to pileup.clp**
plotpup.py --input_pups pileup.clp
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/home/catg1/.local/lib/python3.10/site-packages/coolpuppy/lib/numutils.py:79: RuntimeWarning: Mean of empty slice
return np.nanmean(amap[c - n // 2 : c + n // 2 + 1, c - n // 2 : c + n // 2 + 1])
Traceback (most recent call last):
File "/home/catg1/.local/bin/plotpup.py", line 8, in <module>
sys.exit(main())
File "/home/catg1/.local/lib/python3.10/site-packages/coolpuppy/plotpuppy_CLI.py", line 369, in main
fg = plot(
File "/home/catg1/.local/lib/python3.10/site-packages/coolpuppy/plotpup.py", line 735, in plot
vmin, vmax = get_min_max(pupsdf["data"].values, vmin, vmax, sym=sym, scale=scale)
File "/home/catg1/.local/lib/python3.10/site-packages/coolpuppy/plotpup.py", line 83, in get_min_max
raise ValueError("Data only contains NaNs or zeros")
ValueError: Data only contains NaNs or zeros
/usr/lib/python3/dist-packages/scipy/init.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
I wonder whether it can cause issues...
Can you share the shifted_tels.bed file?
I will try and upgrade numpy version. Sure, here is the file: Since bed file format is not supported, I am sharing the google drive link: https://drive.google.com/file/d/1CHV_MUifsFywEsFsJjeB1XqGjKnDUyp5/view?usp=drive_link
I downgraded numpy version but still no windows were generated, and thank you so much taking time out in helping me solve my query Phlya!!
You have starts set to 9000, but it should be 90000. The idea is that the midpoint of each interval is at 100,000 - then with 100,000 flank you pileup starting from 0 and catch the telomere.
As an alternative, you can set start to 0, end to e.g. 99,000, and then use --rescale --rescale-size 99 --rescale-flank 0 (you can do the arithmetic so it works correctly with the resolution that you want to use, considering rescale size has to be odd)
Okay, I tried by changing the 9000 to 90000 and I did get two plots (attached herewith), I will try the second method and let you know, thanks again!! pup_sample2.pdf pup_sample1.pdf
Perfect! No need to try the other method if this worked IMO, the first one should be faster and maybe a little more precise.
Okay, thanks a lot, I will need to do a bit of trial and error for understanding if I am doing the analysis correctly, but I understood the approach. Have a great day ahead, I am closing this issue for now, if I have any further queries I will reopen this, hope that is fine with you!!
@Brahmandam-Gayatri Hi Gayatri, did you figured this out. I am also having the same issue with the telomeric coordinates of yeast, instead of signal being aggregated at the centre, its aggregated at the corner like this:
If you solved this issue, can you please share it. I am not sure how to set the flank
and how much to shift the coordinates from the beginning.
Thanks!
Hello,
Thank you for developing coolpuppy for pileup analysis. I want to plot an aggregate heatmap for yeast data using telomeric coordinates and I am unable to do so.
chrom start end chrIX 1 7784 chrV 1 6473 chrXIII 1 6344 chrXVI 1 7223 chrIII 1 1098 chrVI 1 5530 chrXII 1 12085 chrVIII 1 5505 chrXV 1 847 chrII 1 6608 chrXIV 1 7428 chrI 1 801 chrVII 1 781 chrX 1 7767 chrXI 1 807 chrIV 1 904
In general, the plot looks similar to the aggregate plot of centromeres, but here the aggregate signal is observed at the corners in place of center. Any suggestions are highly appreciated!!