Closed vnoelifant closed 1 year ago
Hi Bob, I worked on issues #25 and #27. In this PR, I added code that responded to your comment on Slack a while ago that said:
"One thought that just occurred to me, we do org lookups in loop which is 1000s of queries, maybe you can preload a dict of org_id : org_object outside of the loop to do a single query and use this dict (cache) to get the org objects"
But with the latest code I still feel like loading the data takes so long. Am I missing something else? I also figured to do the same performance improvement with getting the themes and countries. But first wanted to check if getting the matching organizations for projects is is on the right track.
I wanted to add @bbelderbos : Instead of this code:
# Create organization dictionary to store organization id and object
organizations = {}
organization_objs = Organization.objects.values("org_id")
for organization_obj in organization_objs:
organizations[organization_obj["org_id"]] = Organization.objects.get(
org_id=organization_obj["org_id"]
)
I had the following:
# Create organization dictionary to store organization id and object
organizations = {}
organization_objs = Organization.objects.values("org_id")
for organization_obj in organization_objs:
organizations[organization_obj["org_id"]] = organization_obj
)
, but Django complained with the following:
File "/home/ronnienm08/charity-finder/charity_finder/management/commands/import_organizations.py", line 168, in insert_active_projects
project.org = matching_org
File "/home/ronnienm08/.local/share/virtualenvs/charity-finder-eKohszMN/lib/python3.9/site-packages/django/db/models/fields/related_descriptors.py", line 235, in __set__
raise ValueError(
ValueError: Cannot assign "{'org_id': 14176}": "Project.org" must be a "Organization" instance.
Which is why I changed Query to Organization.objects.get( org_id=organization_obj["org_id"]. I didn't know another way to set the object instance in the dictionary. But I am suspecting this piece of logic may be slowing things down even more. Any way to get around this?
Hmm didn't know about this.
That would look something like:
# Create organization dictionary to store organization id and object
organizations = {}
organization_objs = Organization.objects.only("org_id")
for organization_obj in organization_objs:
organizations[organization_obj.org_id] = organization_obj
@vnoelifant I have not used only
yet, so the goal is to build up a org dict to quickly look up project orgs and do just one query right?
Then should be fast (and it can be shortened using a dict comprehension):
{org.id: org for org in Organization.objects.all()}
Slowness is probably somewhere else so please time the functions using the shared timeit decorator. You could also profile the code, we have a video about that might help: https://www.youtube.com/watch?v=nnkY3dcH74o
Hi Bob, I worked on issues #25 and #27. In this PR, I added code that responded to your comment on Slack a while ago that said:
"One thought that just occurred to me, we do org lookups in loop which is 1000s of queries, maybe you can preload a dict of org_id : org_object outside of the loop to do a single query and use this dict (cache) to get the org objects"
But with the latest code I still feel like loading the data takes so long. Am I missing something else? I also figured to do the same performance improvement with getting the themes and countries. But first wanted to check if getting the matching organizations for projects is is on the right track.