mlcommons / cm4mlperf-inference

http://docs.mlcommons.org/cm4mlperf-inference/
Apache License 2.0
1 stars 36 forks source link

Syncing GATEOverflow repos with MLCommons ones for SCC'24 #19

Closed gfursin closed 1 month ago

gfursin commented 2 months ago

Hi @arjunsuresh, I am trying to test some of the SDXL workflows for SCC'24 but I see many hardwired dependencies on GATEOverflow repositories that are external to MLCommons (inference, cm4mlops). Since the workflows are stable, is there a reason why you are not using official MLCommons repositories? It would be good to push all the changes back to MLCommons and use MLCommons repositories ASAP ... If students want to make changes, they should fork official MLCommons repositories and send us PRs ... Thanks a lot!

gfursin commented 2 months ago

Some core functionality got broken in CM4MLOps recently and I am getting extremely concerned to rely on multiple unsynchronized CM4MLOps and inference forks particularly if they are outside MLCommons...

arjunsuresh commented 2 months ago

Hi @gfursin we typically use inference, inference_results and cm4mlops repository for an MLPerf workflow. Currently in all the workflows cm4mlops repository used should be of mlcommons only - otherwise it is a bug.

For inference_results repository we are forking the official results repository as there are numerous changes we need to do to make them compatible with CM and these won't be accepted in the official repository as they are not from the submitters.

For inference repository - when we use it as the implementation source - we always use a fork so that we can fix any issues as and when they arise. The merge to the official repository can take weeks as approvals happen only weekly.

And these are not new - this tutorial prepared by you for SCC23 have similar forks being used.

gfursin commented 2 months ago

Hi Arjun,

We did have external forks for cm4mlops and inference during our past developments when we didn't have full control of MLCommons repos.

However, we also had a discussion to move away from this practice and I thought that MLCommons allowed us to create branches on main repositories? Do you know if you can create branches on MLPerf inference repo? If not, I guess we can ask @morphine00 to give you such an access and create a branch for scc24?

That was the reason why I created "dev" branches in various MLCommons repositories to move away from ctuning and other external repositories to MLCommons and use MLCommons branches in official docs.

The reason is simple - if someone decides to rely on CM/CM4MLOps/CM4ABTF for their projects, they know that all sub-repositories are also managed by MLCommons and they can trace issues and ask to fix them ... Otherwise, when the core CM functionality for MLPerf starts breaking and there are various external repositories appearing in the CM-MLPerf workflows, it's very difficult to trace the issues.

However, maybe it's fixed now. Let's wait until the CM4MLOps is fixed for Windows and I will then check CM workflows for MLPerf inference and SCC'24 - I will report if there are still dependencies on external forks or they are gone ... We can then ask MLCommons to create branches at MLCommons repos for us if needed. I am sure MLCommons will be fine with that ...

gfursin commented 2 months ago

I created an issue to check and update our past SCC tutorials: https://github.com/mlcommons/cm4mlops/issues/319

arjunsuresh commented 2 months ago

Sure Grigori. I got access to the inference repository last week and so now I can use branches. But this was after the SCC tutorial and workflow were made.

Meanwhile if you believe something is a core functionality please do add github tests for them. We do not check MLPerf runs on Windows anywhere.

gfursin commented 2 months ago

Cool! Yes, I added tests for Windows and waiting for @anandhu-eng to check them.

arjunsuresh commented 1 month ago

I believe now only the MLCommons repos and branches are used in the MLPerf inference workflows.