scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.9k stars 596 forks source link

deduplication of louvain and leiden doc #570

Open fbnrst opened 5 years ago

fbnrst commented 5 years ago

louvain and leiden have a lot of redundant documentation. After having learned in #557, I could file a PR to deduplicate this. Would it be valid to shuffle the arguments in such a way that the shared documentation is grouped together? Otherwise, one would have to introduce many short strings and puzzle them together.

flying-sheep commented 5 years ago

Sounds like a great idea. generally the order should be the same as in the signature, but I don’t see a problem in reshuffling the lovain args to match the leiden ones.

We have to be careful with details though: e.g. partition_type needs to be slightly different for both:

Type of partition to use. Defaults to :class:`~louvain.RBConfigurationVertexPartition`.
For the available options, consult the documentation for :func:`~louvain.find_partition`.
Type of partition to use. Defaults to :class:`~leidenalg.RBConfigurationVertexPartition`.
For the available options, consult the documentation for :func:`~leidenalg.find_partition`.

@falexwolf do you think we should go ahead with https://pypi.org/project/legacy-api-wrap (and introduce * in louvain’s signatureor do you think we can slightly reshuffle the last few arguments oflouvain` without considering it a backwards compat break?

gokceneraslan commented 5 years ago

Wouldn't it actually make sense to merge louvain and leiden into one function and add an algorithm option or so?

flying-sheep commented 5 years ago

I think we talked about that at one point. But yes, makes a lot of sense. detect_communities would be accurate, but long. Maybe just partition or partitioning?

LuckyMD commented 5 years ago

maybe sc.tl.modularity_clustering() so that you keep the option open of adding other clustering functions in the future. Not exactly shorter though ;).

ivirshup commented 5 years ago

@gokceneraslan since they are largely the same thing (just a different optimization strategy), do we even need to keep both?

Otherwise, I think I'd prefer them to be separate functions, so you don't get argument interactions. For example, the partition_type argument has to be a type from the same package as the method, otherwise there are segfaults.