mlcommons / cm4mlops

A collection of portable, reusable and cross-platform automation recipes (CM scripts) with a human-friendly interface and minimal dependencies to make it easier to build, run, benchmark and optimize AI, ML and other applications and systems across diverse and continuously changing models, data sets, software and hardware (cloud/edge)
http://docs.mlcommons.org/cm4mlops/
Apache License 2.0
9 stars 11 forks source link

Add checksum checks for all MLPerf inference dataset and model downloads #167

Open arjunsuresh opened 3 weeks ago

arjunsuresh commented 3 weeks ago

We need to add checksum checks for all models and datasets used in MLPerf inference. For folder downloads, we can use checksum file like done here

gfursin commented 3 weeks ago

Hi Arjun. Just a quick question: does the recent MD5SUM check mechanism in CM supports MacOS and Windows? Do we have some tests for Windows and MacOS in GitHub actions (I think I saw somewhere that GitHub now has Windows Server support in workflows)? Thanks!

arjunsuresh commented 3 weeks ago

Hi @gfursin the code to do the check is in python and should work on Windows too. But currently we don't have any Windows specific tests for this. We do have Windows and macOS gh actions for CM installation and ABTF inference - that covers CHECKSUM check for individual files - but nothing yet for folders.

gfursin commented 3 weeks ago

Hi @gfursin the code to do the check is in python and should work on Windows too. But currently we don't have any Windows specific tests for this. We do have Windows and macOS gh actions for CM installation and ABTF inference - that covers CHECKSUM check for individual files - but nothing yet for folders.

Cool! Thank you! That's already a very good starting point!

anandhu-eng commented 3 weeks ago

Hi @arjunsuresh , does the env variabe CM_EXTRACT_EXTRACTED_CHECKSUM_FILE here and CM_DOWNLOAD_CHECKSUM_FILE here have same goal of being the path variable to checksum file for extracted files?

arjunsuresh commented 3 weeks ago

Yes