grantmakers / profiles

Grantmakers.io Profiles - Summary profiles for all US-based foundations who have filed electronic IRS Form 990-PF
MIT License
1 stars 1 forks source link

RefinementList widget shows dups in rare cases #75

Open chadokruse opened 5 years ago

chadokruse commented 5 years ago

UPDATE 3: Full discussion: https://github.com/algolia/instantsearch.js/issues/3999


Some facets appear as TitleCase in Algolia dashboard, but appear as UPPERCASE in index.

Results in errors when refining certain facets.

To reproduce:

  1. Visit Wiregrass Foundation
  2. Click on CITY OF DOTHAN in hits table
  3. RefinementList shows the TitleCase version as well as the UPPERCASE version

Can also reproduce in Algolia Dashboard.

So far, have noticed for the following in the following facets:: Facet: grantee_name CITY OF DOTHAN WIREGRASS MUSEUM OF ART

Curiously, there's only one instance in grant_purpose: CORE PROGRAM

The MongoDB document shows UPPERCASE as expected

{
    "_id" : "200897153_2018_48",
    "objectID" : "200897153_2018_48",
    "ein" : "200897153",
    "organization_name" : "WIREGRASS FOUNDATION",
    "city" : "Dothan",
    "state" : "AL",
    "tax_year" : 2018.0,
    "aws_index_year" : "2019",
    "last_updated_grantmakers" : "2019-06-20T19:46:38.155Z",
    "last_updated_irs" : "2019-06-19T01:50:06.8779991Z",
    "grant_amount" : 1000.0,
    "grant_purpose" : "CORE PROGRAM",
    "grantee_name" : "LIGHTHOUSE FAMILY RETREAT",
    "grantee_city" : "Atlanta",
    "grantee_state" : "GA",
    "grantee_state_displayed" : "GA",
    "grantee_country" : "US",
    "grantee_is_foreign" : false,
    "grant_number" : 49.0
}

UPDATE 2 It appears the root cause is the Algolia engine creates facets based the case type of the first record. Subsequent records appear to be case-normalized.

Thus, the reason UPPERCASE "CITY OF DOTHAN" appears as Title Case "City of Dothan" in facets for the Wiregrass Foundation profile is because the facet was created using another foundation's donation to Title Case "City of Dothan".


UPDATE Possibly related to Algolia using UCS-2 encoding

Further research

  1. MongoDB defaults to UTF-8
  2. Confirmed only UPPERCASE appears in source collection in MongoDB (e.g. grants collection)
  3. TitleCase only appears in Algolia facets - records are fine