vnoelifant / charity-finder

3 stars 0 forks source link

27 org query #37

Closed vnoelifant closed 1 year ago

vnoelifant commented 1 year ago

Hi Bob, I worked on issues #25 and #27. In this PR, I added code that responded to your comment on Slack a while ago that said:

"One thought that just occurred to me, we do org lookups in loop which is 1000s of queries, maybe you can preload a dict of org_id : org_object outside of the loop to do a single query and use this dict (cache) to get the org objects"

But with the latest code I still feel like loading the data takes so long. Am I missing something else? I also figured to do the same performance improvement with getting the themes and countries. But first wanted to check if getting the matching organizations for projects is is on the right track.

vnoelifant commented 1 year ago

Hi Bob, I worked on issues #25 and #27. In this PR, I added code that responded to your comment on Slack a while ago that said:

"One thought that just occurred to me, we do org lookups in loop which is 1000s of queries, maybe you can preload a dict of org_id : org_object outside of the loop to do a single query and use this dict (cache) to get the org objects"

But with the latest code I still feel like loading the data takes so long. Am I missing something else? I also figured to do the same performance improvement with getting the themes and countries. But first wanted to check if getting the matching organizations for projects is is on the right track.

I wanted to add @bbelderbos : Instead of this code:

    # Create organization dictionary to store organization id and object 
    organizations = {}
    organization_objs = Organization.objects.values("org_id")

    for organization_obj in organization_objs:
        organizations[organization_obj["org_id"]] = Organization.objects.get(
            org_id=organization_obj["org_id"]
        )

I had the following:

    # Create organization dictionary to store organization id and object
    organizations = {}
    organization_objs = Organization.objects.values("org_id")

    for organization_obj in organization_objs:
        organizations[organization_obj["org_id"]] = organization_obj
        )

, but Django complained with the following:

  File "/home/ronnienm08/charity-finder/charity_finder/management/commands/import_organizations.py", line 168, in insert_active_projects
    project.org = matching_org
  File "/home/ronnienm08/.local/share/virtualenvs/charity-finder-eKohszMN/lib/python3.9/site-packages/django/db/models/fields/related_descriptors.py", line 235, in __set__
    raise ValueError(
ValueError: Cannot assign "{'org_id': 14176}": "Project.org" must be a "Organization" instance.

Which is why I changed Query to Organization.objects.get( org_id=organization_obj["org_id"]. I didn't know another way to set the object instance in the dictionary. But I am suspecting this piece of logic may be slowing things down even more. Any way to get around this?

vnoelifant commented 1 year ago

Hmm didn't know about this.

Screenshot (398)

That would look something like:

    # Create organization dictionary to store organization id and object
    organizations = {}
    organization_objs = Organization.objects.only("org_id")

    for organization_obj in organization_objs:
        organizations[organization_obj.org_id] = organization_obj
bbelderbos commented 1 year ago

@vnoelifant I have not used only yet, so the goal is to build up a org dict to quickly look up project orgs and do just one query right?

Then should be fast (and it can be shortened using a dict comprehension):

{org.id: org for org in Organization.objects.all()}

Slowness is probably somewhere else so please time the functions using the shared timeit decorator. You could also profile the code, we have a video about that might help: https://www.youtube.com/watch?v=nnkY3dcH74o