open2c / coolpuppy

A versatile tool to perform pile-up analysis on Hi-C data in .cool format.
MIT License
77 stars 11 forks source link

providing an array as argument for by_distance #129

Closed hrahmanin closed 1 year ago

hrahmanin commented 1 year ago

Thanks for the great tool!

I am trying to increase the number of bins for the bydistance pileup.

If I understand the docstrings & how the distance_edges edges are created (https://github.com/open2c/coolpuppy/blob/06a4706d531c16581cd74e9c944cd766e8e113f4/coolpuppy/coolpup.py#L2009 ), this should be provided as an array to by_distance. However if I provide provide np.append([0], 5000 * 2 ** np.arange(30))

I receive the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[40], line 2
      1 sites = dots_filtered.copy()
----> 2 pup = coolpup.pileup(clr, sites, features_format='bedpe', view_df=view_df[1:2],
      3                         by_distance=np.append([0], 5000 * 2 ** np.arange(30)),
      4                      mindist=100_000,
      5                         flank=100_000, min_diag=2,
      6                         nproc=16
      7                         )

File ~/.conda/envs/open2c/lib/python3.9/site-packages/coolpuppy/coolpup.py:2006, in pileup(clr, features, features_format, view_df, expected_df, expected_value_col, clr_weight_name, flank, minshift, maxshift, nshifts, ooe, mindist, maxdist, min_diag, subset, by_window, by_strand, by_distance, groupby, flip_negative_strand, local, coverage_norm, trans, rescale, rescale_flank, rescale_size, store_stripes, nproc, seed)
   1847 def pileup(
   1848     clr,
   1849     features,
   (...)
   1877     seed=None,
   1878 ):
   1879     """Create pileups
   1880 
   1881     Parameters
   (...)
   2004     if any, all possible annotations from the arguments of this function.
   2005     """
-> 2006     if by_distance:
   2007         if by_distance is True or by_distance == "default":
   2008             distance_edges = "default"

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

It also could be a little confusing to have the min/max distance arguments as well as this by_distance that can be passed, but it is not clear the best way to simplify the arguments

Phlya commented 1 year ago

Right, thanks for the report! I think I have fixed it, could you try the latest commit from master?

hrahmanin commented 1 year ago

The issue is now resolved; thanks!