Open geoffkilpin opened 10 years ago
So - there are some instances of two committees existing - I think that this could be because the scraper and and the original parliamentary info was entered and is (I believe, being manually managed by PMG) I think we need to consider which the canonical version is - my assumption is that the ones with the "portfolio" in the committee name are scraped - however they don't associate to an organisation (e.g. national assembly) so this would need to be tweaked - also the names are not a user friendly as the other set
The source of committee membership data should be the PMG website (e.g. http://www.pmg.org.za/committees/Communications) and ideally the PMG scraper should be ensuring that changes on the PMG website are mirrored on the PA site (if I recall correctly this currently isn't the case?).
Based on the list at http://www.pmg.org.za/committees I think that at least in this case 'Communications' is the correct committee (although I think the full name is preferable).
This is related to/ a duplicate of #878
So to clarify - is http://za-pombola.staging.mysociety.org/organisation/portfolio-committee-on-communications/ scraper generated or the manually created?
If the former then we need it to associate the committee to an organisation
Further update:
http://za-pombola.staging.mysociety.org/organisation/social-development/ http://za-pombola.staging.mysociety.org/organisation/portfolio-committee-on-social-development/
I think that in the intestests of pragmatism, we should simply delete the dupe committees that have the least comprehensive membership information, and post launch look into the options for automated scraper-driven updating - if no-one objects massively then I will get on and do this
@geoffkilpin is this still a live issue from your perspective?
Just a ping to @geoffkilpin to see if this is still an ongoing concern
I've just been looking into this as PMG is looking to manually update the site to reflect changes to the committee structure. It seems that there are 4 committee organisation kinds:
The duplication seems to come between the first organisation kind and the other three. The suggestion that I have made to address this is to:
This can all be done manually. Is there anything which I might have missed?
Hi @geoffkilpin - thanks for looking into this - indeed, it's very confusing, and anything you can do to resolve that would be helpful. As well as the 4 committee OrganisationKinds that you mentioned, there are also:
I'm not sure if there's any duplication between those types and the others. I printed out all organisations of OrganisationKinds that match "committee", grouped by that kind, and any identifiers associated with them:
Those of kind "Committee" don't have any org.mysociety.za
schema identifiers, which I think means that they were added after the initial data import. (Everything in the initial data import from the Popolo JSON that was based on the CSV files and scraping PMG had one of those identifiers, I believe.)
So, if that matches your understanding as well, I'm basically OK with your plan with some small suggestions:
core_merge_organisations
command, by analogy with the core_merge_people
command to make sure that any memberships associated with the committee that's going to be removed will refer to the new one, and it sets up a SlugRedirect for the old organisation page.started
and ended
dates of the Organisation, and then fix any views (e.g. the organisation kind views) that might show old committees. We should also then consider how to prevent people from accidentally using those old committees, e.g. hiding non-current organisations from organisation kind views, not autocompleting them in the admin, perhaps adding a warning at the top of the admin page for an old organisation, etc.Does that sound sensible to you?
Hi @mhl - many thanks for taking a look at this and for spotting the extra committees. I will discuss with PMG what to do about the provincial committees.
To respond specifically to your suggested changes to my plan:
ended
field suggestion - that seems far more appropriate.On a slightly related note - as far as I can tell SlugRedirects are not created when a slug is edited (so I won't modify slugs when correcting names to be their official names), but might this perhaps be something worth adding at some point?
Hi @geoffkilpin - sure, I'm happy to go with whatever you think's best with regard to merging or not, based on data quality.
Yes, SlugRedirects really should be created on editing slugs. The support for slug redirection was intially very basic - I improved it quite a bit in this recent pull request but didn't get to doing that... I'll create a ticket for it now.
Incidentally, to correct my earlier comment and the potentially confusing gist, @dracos pointed out to me that the organisations of kind Committee
did have org.mysociety.za
identifiers in the original JSON - they're still in the database, but pointing to now deleted objects. I can't remember off-hand why this might have happened, but I don't think it's important for your proposed changes.
Thanks @mhl. I think the 'Committee' kind organisations were scraped from Parliament's website as the others are from the PMG site. I seem to recall Parliament's list was quite out of date which is why we went with PMG's, but I'll check all that when working out whether to merge or delete.
I believe this has been resolved by consuming the API
I'm reopening this, because I don't think it ever was resolved in the way that Paul suggests - we're using the PMG API to find committee appearances, but memberships of those committees are still being maintained in the Pombola admin and there is still confusion over which committees to use due to these duplicates. @chrismytton is looking into this.
From looking at the data in the database and re-reading this thread it seems that the following actions are needed before we can close this ticket:
Organisation
s which use the OrganisationKind
"Committee" over to the duplicate committee (possibly using a core_merge_organisations
as @mhl mentioned aboveOrganisation
s have no memberships pointing to them delete themOrganisationKind
with the name "Committee"I think most of this can be done from the admin, so it might be worth explaining the situation to PMG and seeing if they can do some of the work needed.
We might also need to make some changes in the admin to make it more obvious which committees should be used as @mhl mentions above.
e.g. hiding non-current organisations from organisation kind views, not autocompleting them in the admin, perhaps adding a warning at the top of the admin page for an old organisation, etc.
Some (all?) committees exist twice in the database, e.g:
This means that on some profile pages a committee is listed more than once - e.g. http://za-pombola.staging.mysociety.org/person/charles-danny-kekana/
I am not sure of the source - was a scraper of Parliament's website ever added? The source of committee membership should be the PMG scraper.
I originally picked up on this on my local installation - so somewhere during the import of data the duplication is arising.