gsautter / goldengate-qualitycontrol

Data Quality Control and Data Quality Assurance related tools for the GoldenGATE markup system.
Other
1 stars 0 forks source link

Stats needed #70

Open myrmoteras opened 4 years ago

myrmoteras commented 4 years ago

@gsautter and @brokentool

can you please provide us the following stats: how many errors do we get? number of errors and kind of errors by article distribution of errors by severity level? How much time does it take to fix the errors, by kind of errors, overal? How much time does it take in average to QC a document? How many document have been QC in total, per month? How many documents do not show and critical, blocking errors? What is the change in the percentage of errors per documents? Is there are trend in specific journals?

Thanks for the figures needed for Arcadia and the BiCIKLE projet

gsautter commented 4 years ago

Here the cumulative numbers on QC up to this day:

As to the working times, @brokentool should be able to provide the figures.

brokentool commented 4 years ago

How much time does it take to fix the errors, by kind of errors, overal?

How much time does it take in average to QC a document?

How many document have been QC in total, per month?

How many documents do not show and critical, blocking errors?

What is the change in the percentage of errors per documents? Is there are trend in specific journals?

myrmoteras commented 4 years ago

@brokentool might it be possible to capture the time it takes for the QC. It would like to suggest to keep a XLS where you write down the time for:

  1. overall time
  2. handling blockers
  3. critical errors
  4. majors
  5. minors because we need this statistics.

@gsautter and all: We need to discuss how we deal with the error types. Clearly, we need to remove blockers first, then critical.

myrmoteras commented 4 years ago

@gsautter might it be possible to get the stats

Here the cumulative numbers on QC up to this day:

  • QCed documents: 1,644 (out of 21,608)
  • resolved errors: 111,324 (out of 159,604)
  • remaining errors: 322 blockers, 15,072 criticals, 37,069 majors, 5,858 minors
  • resolved errors: 1,799 blockers, 46,136 criticals, 60,839 majors, 2,550 minors

As to the working times, @brokentool should be able to provide the figures.

@gsautter for the BiCIKLE: might it be possible to get the stats above for 2019, 2018 separately so we can see how much this is in term of overall production? https://docs.google.com/spreadsheets/d/1dUAhoEfVDq-2ZFdrWWk8vreU60cWqbrW1UCtRlYsa5o/edit#gid=909456627

brokentool commented 4 years ago

@myrmoteras we do not deal with major and minors at all. as for the rest, I'll see what I can do, but it will take time

gsautter commented 4 years ago

There is no QC numbers for 2018 ... we only ever started this in 2019. Also, it's kind of hard at this point to aggregate by years, as there stats are extracted directly from the IMF repo ... I have yet to build some dedicated statistics component.

What we can do is diff the above numbers with the ones from before the last sprint, which should give us at least a pretty good idea about the last 4 months.