Test and play with some of the example models and services we've found

Goal

What is a realistic goal? Expecting perfection for an AI utility is impossible. AI tools based on a training set cannot have 100% accuracy. Nonetheless, the accuracy they provide should be considerably greater than using humans for the same task.

Corpus

Is the corpus large enough? Is the training set large enough?
What are the start and end dates for the data in the corpus? Does this matter?
Who chose the corpus, when was it chosen and for what purpose? Details of the corpus used, like the data for a research article, should be publicly stated and accessible.
What is the corpus bias?
Is the tool likely to raise diversity, equality and/or inclusion issues?
Is personal data captured and reused?

Algorithm

Have the developers provided a single-sentence summary of the methodology behind the algorithm?

Evaluation and Metrics

Have I measured the current process before introducing any change, for example, time taken, number of errors?
Who to evaluate: end users or subject-matter experts, or both? Internal or external?
What metrics will be used to evaluate the tool? The F1 score, if used, must be interpreted in context.

Sanity check

Sanity check/common sense: Have the developers built in ‘common-sense’ limitations to prevent the algorithm being applied too widely? Am I asking a meaningful question? Is this a feasible exercise?
Does the tool provide feedback when a question is out of scope?
Based on the checks above, is the tool fit for purpose?

Dissemination

Is there easy-to-read documentation and guidance for new users that explains in simple terms how to use the tool and how it improves on current processes?

Feedback

Does the tool provide a feedback loop so it can be improved over time?

uclibs / AI-Project

Test and play with some of the example models and services we've found #11