navis-org / navis

Python library for analysis of neuroanatomical data.
https://navis.readthedocs.io
GNU General Public License v3.0
80 stars 29 forks source link

Make nbl.smat.LookupNdBuilder more generic #97

Open clbarnes opened 2 years ago

clbarnes commented 2 years ago

At present it's annotated to take Dotprops and functions which operate on Dotprops. Synblast doesn't use Dotprops but would benefit from a LookupNdBuilder, and I don't think the builder actually uses any dotprops functionality as it just uses the passed-in function to operate on them. Any object with length N and any function which takes 2 of those objects to produce an array of length N should work, so this may just be a case of adding a generic type annotation.

Requested by @swilson27

schlegelp commented 1 year ago

Happy to pick this up if you don't have the bandwidth at the moment, @clbarnes.

At first glance, the only problem appears to be that LookupNdBuilder uses the length of individual neurons to calculate the n_matching_qual_vals variable. See the len(self.dotprops[q_idx]) in this method, for example:

    def _get_pairs(self):
        matching_pairs = list(set(self._yield_matching_pairs()))
        # need to know the eventual distdot count
        # so we know how many non-matching pairs to draw
        q_idx_count = Counter(p[0] for p in matching_pairs)
        n_matching_qual_vals = sum(
            len(self.dotprops[q_idx]) * n_reps for q_idx, n_reps in q_idx_count.items()
        )

        nonmatching_pairs = self._pick_nonmatching_pairs(n_matching_qual_vals)
        return matching_pairs, nonmatching_pairs

If I understand this correctly, you're using it to match the number of points (i.e. size) rather than the number of neurons between matching and non-matching pairs? Currently only Dotprops implements a __len__ method though. In addition, the similarity metric might not be using whatever __len__ measures (synBLAST being an example for that).

I can see three solutions:

  1. Implement __len__ methods for all neuron types.
  2. Use len(object) if possible and, failing that, fall back to the total number
  3. Let the user define a function for that. Default would be len_func=len but could also be e.g. len_func=lambda x: len(x.connectors) for synBLAST or len_func=lambda x: 1 for when we don't care about size, only counts.

I'd favour option 3. Thoughts?

schlegelp commented 1 year ago

@swilson27 I pushed some changed to the generic_lookupnd branch. I haven't tried this with syNBLAST yet but perhaps you want to take it for a spin?