pyinat / pyinaturalist

Python client for iNaturalist
https://pyinaturalist.readthedocs.io
MIT License
134 stars 16 forks source link

Feature request: Exact match on taxon name #595

Open jhuus opened 1 month ago

jhuus commented 1 month ago

Feature description

I'm not sure if this is a bug or a feature, but currently doing get_observations(taxon_name='X') returns all results containing X. For example, taxon_name='Bushtit' returns Eurasian Bushtit, Long-tailed Bushtit etc. It seems to me it should default to an exact match, or at least provide an option for exact match, so taxon_name='Bushtit' returns matches for 'Psaltriparus minimus' only.

Use case

Is there a specific goal that this would help you accomplish, or can you provide any other context about how you would like to use this feature?

Workarounds

Is there an existing workaround to accomplish this?

JWCook commented 3 weeks ago

Good question. Partly this is a limitation of the iNaturalist API. When searching observations by common name, it will always do a substring match, not an exact match, likely due to lack of standardization among common names. I don't believe there are any options for an exact string match, but there are a couple ways around this:

1: Use the scientific name

This option is probably the easiest, if you already have the scientific name. You can use the same taxon_name argument:

from pyinaturalist import get_observations

obs = get_observations(taxon_name='Psaltriparus minimus')

2: Use the taxon ID

In the rare case where even the scientific name is ambiguous, the most precise way to query by taxon is by ID, which you can find on inaturalist.org (example):

obs = get_observations(taxon_id=7266)

3: Use taxon text search

If you want to get the taxon ID without opening a web browser, you can use the taxon autocomplete search, which is basically the same function used in the search bar on iNaturalist.org. Results are ranked by relevance, so the first result is usually (but not guaranteed to be) the one you're looking for. Then you can use that result's taxon ID to search observations:

from pyinaturalist import get_observations, get_taxa_autocomplete

taxa = get_taxa_autocomplete('Bushtit')
first_match = taxa['results'][0]
print(first_match['id'], first_match['rank'], first_match['name'])
# 7266 species Psaltriparus minimus

obs_count = get_observations(taxon_id=7266, count_only=True)
print(f'Total results: {obs_count ["total_results"]}')
# Total results: 35625

obs = get_observations(taxon_id=7266)

3b: Taxon text search with iNatClient

If you want to try out the newer pyinaturalist interface (which has a few more features and is easier to use in some cases), here's an equivalent example:

from pyinaturalist import iNatClient

client = iNatClient()
first_match = client.taxa.autocomplete('Bushtit').one()
print(first_match)
# Taxon(id=7266, full_name=Psaltriparus minimus (Bushtit))

query = client.observations.search(taxon_id=first_match.id)
print(f'Total results: {query.count()}')
# Total results: 35625

obs = query.all()

Note: that's a very large query, so you'll probably want to narrow that down with other search criteria.

Reference:

Hope that helps. Let me know if you have any other questions!

jhuus commented 3 weeks ago

Thanks - that's very helpful! Do you think it's worth asking iNaturalist to add an exact-match option to their API? It seems to me that would be an improvement.