scikit-tda / tadasets

Synthetic data sets apt for Topological Data Analysis
http://tadasets.scikit-tda.org
MIT License
34 stars 7 forks source link

Add Swiss cheese #1

Open sauln opened 5 years ago

sauln commented 5 years ago

Allow options for spotty 2-manifolds, like Swiss cheese.

This Swiss cheese could then be used as a base for constructing all of the 2-manifolds; the spheres, swiss rolls, etc.

We could also extend this to higher dimensions, I.e. dense 3-d point cloud with random balls excluded.

Filco306 commented 3 years ago

I think this is relatively easy to do. I can add this I think.

sauln commented 3 years ago

That would be great, thank you!

Filco306 commented 3 years ago

So, this is what I have so far, but it is not extremely efficient and holes tend to overlap.

def generate_swiss_holes(n_holes, d):
    """ Generates Swiss Cheese Holes

    Parameters
    ============
    n_holes: int
        number of holes in return data set.
    d: int
        number of dimensions
    """
    # Sample radiuses from a log-uniform distribution
    # Log uniform to ensure sizes vary reasonably
    radiuses = np.exp(np.random.uniform(np.log(0.2),np.log(0.1),size=n_holes))
    centers = np.random.uniform(-1,1,size=(n_holes,d))
    return centers, radiuses

def in_a_hole(row, centers, radiuses):
    """Helper function
    """
    return any(np.apply_along_axis(np.linalg.norm,1,row - centers) <= radiuses) is False

def d_swiss_cheese(n_points=10000, n_holes=4, d=2, noise=None, seed=None):
    """ Creates a square-formed swiss cheese manifold in d dimensions.

    Parameters
    ============

    n_points: int
        number of points in returned data set.
    n_holes: int
        number of holes in return data set.
    d: int
        number of dimensions
    """
    if seed is not None:
        np.random.seed(seed)

    points = np.random.uniform(-1,1, size=(n_points, d))
    centers, radiuses = generate_swiss_holes(n_holes, d)

    points = points[np.apply_along_axis(in_a_hole,1,points,centers,radiuses),:]
    return points

import matplotlib.pyplot as plt
points = d_swiss_cheese(n_holes=20)
plt.scatter(points[:,0],points[:,1])

Do you think we should ensure holes are not overlapping?

sauln commented 3 years ago

looks like it works :D.

I think real swiss cheese has overlapping holes, right? I could see non overlapping holes would be helpful if you were trying to check correspondence between the number of holes and computed homology.

How hard do you think it would be to have non overlapping holes? I'm not too worried about it if it's difficult.

ctralie commented 3 years ago

You could accomplish that with furthest points sampling, but I don't think it's necessary

On Thu, Nov 12, 2020 at 11:50 AM Nathaniel Saul notifications@github.com wrote:

looks like it works :D.

I think real swiss cheese has overlapping holes, right? I could see non overlapping holes would be helpful if you were trying to check correspondence between the number of holes and computed homology.

How hard do you think it would be to have non overlapping holes? I'm not too worried about it if it's difficult.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-tda/tadasets/issues/1#issuecomment-726200252, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJWDZUZC22FTJEC2CQ5763SPQG5PANCNFSM4FPHH44Q .

Filco306 commented 3 years ago

Nah, non-overlapping holes is not difficult. We can just do that by discarding bigger holes colliding with smaller. This will bring a bias towards smaller holes, but I think that is better than a bias towards bigger holes.

Filco306 commented 3 years ago

Now, I did a PR. Check out #11 :)