pyinat / pyinaturalist

Python client for iNaturalist
https://pyinaturalist.readthedocs.io
MIT License
133 stars 16 forks source link

Improvements for taxonomy trees and formatting #502

Closed JWCook closed 1 year ago

JWCook commented 1 year ago

Updates #499

Formatting:

Ranks:

Trees:

codecov[bot] commented 1 year ago

Codecov Report

Merging #502 (e8d3bf2) into main (aeebf5a) will increase coverage by 0.19%. The diff coverage is 97.46%.

@@            Coverage Diff             @@
##             main     #502      +/-   ##
==========================================
+ Coverage   95.81%   96.01%   +0.19%     
==========================================
  Files          58       58              
  Lines        3179     3212      +33     
  Branches      609      622      +13     
==========================================
+ Hits         3046     3084      +38     
+ Misses         94       91       -3     
+ Partials       39       37       -2     
Impacted Files Coverage Δ
pyinaturalist/models/identification.py 100.00% <ø> (ø)
pyinaturalist/models/observation.py 98.41% <ø> (ø)
pyinaturalist/models/taxon.py 98.54% <97.36%> (+3.11%) :arrow_up:
pyinaturalist/constants.py 100.00% <100.00%> (ø)
pyinaturalist/formatters.py 91.00% <100.00%> (ø)
synrg commented 1 year ago

See the snippet from dronefly-core below which works when ranks_to_count is COMMON_RANKS, RANK_LEVELS.keys(), "leaf", or "species". However, I have a few unresolved issues with my code so far that could be helped by additional changes in this PR.

  1. Maybe support filtering other than just by a list of included ranks? In particular, I had to do a special case for leaves. Users might have other particular criteria as well.

  2. I had to add my own filter for the "species" case even though I passed it to make_tree() as one of the ranks to include because of the auto-inclusion of the Life root. I don't want Life included in my output, and in any case, the generated node is created as a Taxon not a TaxonCount, making it unusable in my output without further adjustments. Because it isn't a TaxonCount, it lacks descendant_obs_count - that would throw an exception if I did include it in my display. I'm not exactly sure what to do here - maybe this is a case where having the filter in the flatten as well makes sense?

  3. Also, maybe the default sort order could be improved? With the following, the per species and per any cases produce nicely sorted output. However, the per main and per leaf cases produce out-of-order lists of species due to filtered ranks collapsing higher rank level groupings.

def taxa_per_rank(life_list: LifeList, ranks_to_count: Union(list[str], str)):
    options = {}
    if isinstance(ranks_to_count, list):
        options["include_ranks"] = ranks_to_count
        include = lambda _t: True
    else:
        if ranks_to_count == "leaf":
            include = lambda t: t.count == t.descendant_obs_count
        else:
            options["include_ranks"] = [ranks_to_count]
            include = lambda t: t.rank == ranks_to_count
    for taxon_count in make_tree(life_list.data, **options).flatten():
        if include(taxon_count):
            yield taxon_count
synrg commented 1 year ago

Good. Apart from this slight glitch in how my TaxonFormatter deals with the autocreated TaxonCount for life, everything looks fine now:

image

According to this, is_active should be True for Life:

image

JWCook commented 1 year ago

Thanks for testing this out!

I don't want Life included in my output

In the case where this needs to be added due to multiple root nodes, maybe I could add a flag indicating it can be ignored elsewhere (including in flatten())?

the generated node is created as a Taxon not a TaxonCount

Fixed.

out-of-order lists of species due to filtered ranks collapsing higher rank level groupings

Ah, I thought I accounted for that by applying sorting before skipping any ranks, but it must not be working as intended. To help with debugging, do you have an example that shows this?

JWCook commented 1 year ago

is_active should be True for Life:

Fixed

synrg commented 1 year ago

Thanks. I haven't fully caught up to all the changes (e.g. not yet making use of ancestors you've built) but at least our branches are now compatible. If the flattened tree has Life at the root I just skip over that one now and decrease the indent level at the same time. I imagine the code I just added will be even simpler once I fully take advantage of everything you've done. This is coming together very nicely!

JWCook commented 1 year ago

Okay, if this is working for you now, I'll go ahead and merge what I've got so far.

I also added a hide_root option to Taxon.flatten(), but we could replace that with a better solution later on.