mlopscommunity / open-questions-ai-quality

0 stars 0 forks source link

Evaluating the need for better data #37

Open adamboazbecker opened 1 month ago

adamboazbecker commented 1 month ago

How should we evaluate the need for better data? When does it make sense to focus on data quality?

y27choi commented 1 month ago

I believe we should always focus on improving the data quality. Your model is only as good as your data, so you can only expect so much from your model when your data quality is low.

vishakha041 commented 1 month ago

I had someone describe what happens with bad data - the model deployed in production didn't meet the accuracy expectations which meant a loss of an entire quarter of 2-3 people's time (data scientist / data engg / platforms person). But it was harder to quantify for their management and pin down the problem to a lack of tools that let the team easily query and look at their dataset images. I think tying it to downstream revenue / cost problems can help build a case that data quality needs to be addressed from the very beginning and tied to monitoring model accuracy on real deployments to catch missing cases (I think Andrej Karpathy had a talk LONG time ago on this).