wikipathways / wikipathways.github.io

GitHub pages for GPML-Repo
https://www.wikipathways.org/
5 stars 18 forks source link

Author order incorrect on WP5121 #110

Closed khanspers closed 11 months ago

khanspers commented 11 months ago

Seems like the order of authors is incorrect on WP5121. This then causes the list of first-authored pathways to be incorrect for the user incorrectly listed first. Example:

https://www.wikipathways.org/pathways/WP5121.html

Pathway was created by user Tadeldowu according to pathway history on classic, but Eweitz is listed as first author on new site.

The most recent edit was by Eweitz, but this is true for other pathways too and they don't have the same problem (WP272, WP5088, WP5356 etc).

AlexanderPico commented 11 months ago

Even worse, it's missing a bunch of authors. But this is not a website issue. The source of an issue can be verified by checking the upstream files:

https://github.com/wikipathways/wikipathways-database/tree/main/pathways/WP5121

The GPML file here looks correct, but the info.json file is not. This json file is generated by the meta-data-action script. I also noticed here that while the GPML was updated 10 months ago, the json and certain other files have not been updated in a long time.

So, the first thing I'll try is simply deleting the old, incorrect files and letting the current code regenerate them. If fixed, then we can re-run other cases like this. If not fixed, then we have a bug in the current meta-data-action code.

AlexanderPico commented 11 months ago

Good news! Simply re-running the GPML through our current code fixed the issue. So, this means we had a bug (over a year ago) and some number of pathways may have been incorrectly processed. We need to identify these, delete the old metadata files (JSON, TSV and MD) and rerun them.

@khanspers ideas on how to find other cases like this?

khanspers commented 11 months ago

The only thing I can think of programmatically would be to compare the author list returned from rWikiPathways (getPathwayHistory) with the author list stored in gpmls for download (https://data.wikipathways.org/current/gpml/). rWikiPathways uses web services which uses classic site content, while the downloads are created from content on the new site, right?

AlexanderPico commented 11 months ago

That's a lot...

I'll check a couple other pathways edited a day before and after the buggy one to estimate if there is a real problem or not.

khanspers commented 11 months ago

If you let me know the date in question I can also check?

AlexanderPico commented 11 months ago

Ok. The bug appeared in the big Jan 31 updated where I pushed 73 updated GPMLs (prior to us having daily sync working). Using GitHub I could easily review these 73 diffs (I love GitHub!) and found 3 more buggy cases.

I will rerun these and check to see they are fixed. Then, I think we're done!

AlexanderPico commented 11 months ago

All fixed, but...

I found another bug. #111