Closed ValWood closed 7 months ago
I mentioned today in the group meeting that this had gone up to 43.6% recently last week... I think it's statistically significant because a) It's a complete dataset , not a sample, in which case you don't require statistics to explain the increase? and b) the number is just the ratio of curated vs. non curated out of the sessions sent out? and it is continually increasing...
If we plotted the response rate I'm sure it is a continuously upward trajectory... which is basically what we are interested in...I want to get to 50% this year...
@kimrutherford is it easy to include this as a graph in the stats? It would be much nicer than the number. It's not urgent but it might be a nice quick task if you want something "alternative" to the big browser elephant....
Does that all make sense?
is it easy to include this as a graph in the stats?
All the data is available so it wouldn't be too hard.
There are some edge cases to think about. Like this session which was sent out twice, in different years: https://curation.pombase.org/pombe/view/object/curs/4315?model=track should that count towards 2016 or 2017?
I envisaged that we would just use the ratio of the ones which are sent out vs. the one sent back.
So, the numbers
To date 1361 publications have been assigned to community members for curation. 597 are finished and are either in the main PomBase database or are currently being checked by the PomBase curators. That's a response rate of 43.8%.
so its always the first date sent out (things which are sent out multiple times are just reminders).
I envisage that the graph will look like this:
i.e goes up continually but very slowly.
I'm keep it going up by sending out enough reminders to sustain an increase. I don't send out too many at once as we would be swamped...
Eventually it will plateau when we are just left with the people who will never do any. We are a long way from that yet.... I'm still getting lots of "sorry I will do it" and a good uptake when I send reminders, even for old sessions...
y axis is %
I might be wrong because I don't know what the graph would look like at the start when the number of session was low! Actually I think it may begin at about 30%. Certainly for the past few years it has been going up slowly (this is partially due to the fact that the uptake on new papers is usually more immediate, it's old ones that are stagnating....)
44.1%. .....we will get to 50% by the end of the year I'm sure.....
44.3%.....
It was 32% when I did this presentation: https://www.slideshare.net/ValerieWood/community-curation-at-pombase (I cant remember when, I think it was about 18 months ago)
44.4% I do wish I had't sent out so many reminders at once...I want it to stop..... No more until these dry up.....
Hi @kimrutherford what's your question here. I should be able to describe better.
Hi @kimrutherford what's your question here. I should be able to describe better.
I think this answers my question:
so its always the first date sent out (things which are sent out multiple times are just reminders).
I think I mis-read it and then added "discuss".
Will keep this open, would nice to see the cumulative increase on the stats page: https://curation.pombase.org/pombe/stats/annotation
It would be nice to have a cumulative graph showing the growth over time eventually (the only way is up)
Is that true? If you sent out a bunch of sessions won't the response rate (temporarily) drop?
the drop is usually less than a fraction of % point so it won't show in the plot.
if it ever dropped I would send out more reminders ;)
actually, that isn't the response rate graph, its the other one (2B), they look similar.
I would upload it but i need to swap laptops and mail it to myself because I can't upload to github on the other laptop. I really need to sort my environment!
I've done some querying in Chado. I think the numbers don't match up with the 50% response rate shown in Canto because not all of the publications in Canto are exported to Chado. There are community sessions triaged as "Erratum" and "Wrong organism" for example which aren't exported.
I've made a new report "uncuratable publications with a community session" to help work this out: https://curation.pombase.org/pombe/view/list/uncuratable_publications_with_a_community_session?model=track
Is a session is approved, the Canto details are exported to Chado regardless of the triage status.
This publication is an Erratum, but has an approved session: https://curation.pombase.org/pombe/view/object/pub/11918?model=track
Here are the numbers from Chado:
year | submitted | sent_sessions | response_rate
------+-----------+---------------+---------------
2013 | 91 | 927 | 9.8
2014 | 174 | 1055 | 16.4
2015 | 260 | 1171 | 22.2
2016 | 403 | 1280 | 31.4
2017 | 502 | 1378 | 36.4
2018 | 641 | 1475 | 43.4
2019 | 771 | 1579 | 48.8
2020 | 800 | 1593 | 50.2
Note to self, query with:
WITH counts as (SELECT year,
(SELECT COUNT (*)
FROM pombase_publication_curation_summary
WHERE canto_curator_role = 'community'
AND (canto_annotation_status = 'NEEDS_APPROVAL' OR canto_annotation_status = 'APPROVAL_IN_PROGRESS' OR canto_annotation_status = 'APPROVED')
AND (canto_session_submitted_date IS NOT NULL
AND canto_session_submitted_date <= (YEAR || '-12-30')::date)) AS submitted,
(SELECT COUNT (*)
FROM pombase_publication_curation_summary
WHERE canto_curator_role = 'community'
AND (canto_approved_date is not null OR canto_first_sent_to_curator_year IS NOT NULL
AND canto_first_sent_to_curator_year <= YEAR)) AS sent_sessions
FROM generate_series(2013,
(SELECT extract(YEAR
FROM CURRENT_DATE))::integer) AS YEAR)
SELECT year, submitted, sent_sessions, trunc(100.0*submitted/sent_sessions,1) as response_rate from counts;
Ah OK.
PMID:31579888 is the one which had 2 PMIDs. This ID will be deleted.
Some are methods papers. Occasionally people get annotations from methods papers. We want to class these as "methods" & "curated"
One day we need to sort the classification so the "publication type" and " curation status" are separate
I removed the sessions. I'm guessing we don't include any session that no longer exists? The numbers should not be affected much. There were similar numbers of "IN PROGRESS" and "APPROVED"
Phew, I promise I did not "fix" this:
To date 1578 publications have been assigned to community members for curation. 789 are finished and are either in the main PomBase database or are currently being checked by the PomBase curators. That's a response rate of 50%.
It's still 50%!
I removed the sessions.
Thanks.
I'm guessing we don't include any session that no longer exists?
Yep, they will disappear from Chado in tonight's load. I'll run that response rate query again tomorrow.
The query seemed to update itself anyway a short while after I deleted the sessions ?
The response on the Canto stats page is queried straight from Canto's database. There is an up to 10 minute delay seeing changing because the page contents are cached for speed.
good, so we are still at 50%
The numbers almost match now:
year | submitted | sent_sessions | response_rate
------+-----------+---------------+---------------
2013 | 90 | 917 | 9.8
2014 | 172 | 1042 | 16.5
2015 | 258 | 1157 | 22.2
2016 | 400 | 1265 | 31.6
2017 | 497 | 1363 | 36.4
2018 | 633 | 1460 | 43.3
2019 | 760 | 1563 | 48.6
2020 | 789 | 1577 | 50.0
Removed next. Would be nice to add this visual to the stats page, but no urgence
All papers are triaged and assigned out up to yesterday so the response rate has dropped a little to 50.5% (it was 51% yesterday)
Anyway, this item is very non urgent (it predated the CC paper and we included such a graph) I'm putting as future. Should it be on the website tracker instead?
53.9% still increasing It seems that this is largely done, so a graph could be added to this page: https://curation.pombase.org/pombe/stats/annotation
Latest query result:
year | submitted | sent_sessions | response_rate
------+-----------+---------------+---------------
2013 | 88 | 1233 | 7.1
2014 | 169 | 1330 | 12.7
2015 | 253 | 1430 | 17.6
2016 | 392 | 1513 | 25.9
2017 | 483 | 1593 | 30.3
2018 | 615 | 1673 | 36.7
2019 | 740 | 1748 | 42.3
2020 | 862 | 1828 | 47.1
2021 | 982 | 1929 | 50.9
2022 | 1050 | 1990 | 52.7
2023 | 1132 | 2072 | 54.6
2024 | 1136 | 2083 | 54.5
I had the query wrong and it was making a mess of the older sessions.
year | submitted | sent_sessions | response_rate
------+-----------+---------------+---------------
2013 | 88 | 284 | 30.9
2014 | 169 | 481 | 35.1
2015 | 253 | 693 | 36.5
2016 | 392 | 896 | 43.7
2017 | 483 | 1088 | 44.3
2018 | 615 | 1272 | 48.3
2019 | 740 | 1448 | 51.1
2020 | 862 | 1624 | 53
2021 | 982 | 1817 | 54
2022 | 1050 | 1930 | 54.4
2023 | 1132 | 2068 | 54.7
2024 | 1136 | 2081 | 54.5
I've added a curation response rate graph. Hopefully it will be on the main site in the morning but I've just had to restart the load so we'll see.
In the meantime it available on my desktop version: https://desktop.kmr.nz/curation_stats
Hopefully it will be on the main site in the morning but I've just had to restart the load so we'll see.
The load finished after a few false starts. GitHub was returning errors when the load script trying to check for the latest Mondo.
https://pombase.org/curation_stats
I had the query wrong and it was making a mess of the older sessions.
I'm still not 100% sure I have it right so I plan to check it again tomorrow after a good sleep. :-)
Great! we are realt flatlining. I'tt get this going again when PAscal starts
Can we make the graph start earlier ? (2012)
Also the graph doesn't match the early years to this one (30% is high for 2013), is this definitely 1st submission, or 1st approval data?
Can we make the graph start earlier ? (2012)
Unfortunately the date stamps needed from Canto only go back to mid 2013.
is this definitely 1st submission, or 1st approval data?
It's calculated using the submitted date. It does that so that it matches the Canto stats page which uses the number of submitted sessions.
I'm going to look at this again in the morning because I've just spotted another problem. Currently it counts submitted sessions up to a given year and then divides by sessions sent out up to the same year. But it's going to get this wrong for sessions that were submitted in a different year to the year they were sent out. There are quite a few of those. Whoops.
Should the years in the graph be the year sent out or the year submitted? Or year approved?
submitted I think (the gap between submission and 1st approval should be less than a week 90% of the time so these numbers should be very similar)
Unfortunately the date stamps needed from Canto only go back to mid 2013.
OK- the numbers are definitely different from the curation paper graph
the numbers are definitely different from the curation paper graph
I think the graph from the paper might be wrong but let's have a chat about this on the next call.
I've double checked the query that generates the current graph and I think it's correct. But it could be that it's not asking the right question. https://pombase.org/curation_stats
For Kim: find backup from December 2012 to add response rate for that year
find backup from December 2012 to add response rate for that year
After a bit of digging, the response rate for 2012 was 91.6%
There were 12 community sessions sent out and 11 were submitted. Did you send them to people you knew would respond?
year | submitted_for_approval_count | sent_or_accepted_count | response_rate
------+------------------------------+------------------------+---------------
2012 | 11 | 12 | 91.6
2013 | 90 | 280 | 32.1
2014 | 171 | 480 | 35.6
2015 | 255 | 695 | 36.6
2016 | 392 | 899 | 43.6
2017 | 483 | 1092 | 44.2
2018 | 616 | 1276 | 48.2
2019 | 745 | 1452 | 51.3
2020 | 869 | 1628 | 53.3
2021 | 990 | 1821 | 54.3
2022 | 1058 | 1934 | 54.7
2023 | 1141 | 2072 | 55
2024 | 1144 | 2085 | 54.8
Yes, I think that was probably the pilot project sessions. I put them all through later as community curated (or we changed them to community curated), I don't quite remember. Maybe we begin with 2013 when we started properly
I'll close this as it's getting long and I think it's done.
In the stats, we report the response rate as a percentage (currently around 42%). It goes up, but very slowly. It would be nice to have a cumulative graph showing the growth over time eventually (the only way is up)