Open zfhxi opened 10 months ago
Hello! Thank you for sending this information! Could you send a link to your workspace so we can look at it? Only wandb
employees will be able to view your project if this is a private project.
Also, could you verify that the launch job you created corresponds to the run id avc8q10w
? Just to make sure that we are looking at the same run as the one created.
Hello! Thank you for sending this information! Could you send a link to your workspace so we can look at it? Only
wandb
employees will be able to view your project if this is a private project.Also, could you verify that the launch job you created corresponds to the run id
avc8q10w
? Just to make sure that we are looking at the same run as the one created.
Thank you for your response. I've created a demo at https://github.com/zfhxi/test_wandb_launch_job
After hours of work, I've found this solution:
import os
import argparse
import subprocess
import sys
from git import Repo
def restart_program():
p = subprocess.Popen([sys.executable] + sys.argv)
p.wait()
print("Fininshed the sub program!")
sys.exit(0)
def reset_commit(repo, commit_id, workspace):
commit = repo.commit(commit_id)
repo.head.reset(commit=commit, index=True, working_tree=True)
print( f"Workspace {workspace} is checkouting to {commit_id} ...")
def prerun(args):
# Confirming if the current branch matches the specific job commit
if bool(args.wandb_job_commit):
repo = Repo(args.workspace)
current_commit = repo.head.commit.hexsha
# assert current_commit == args.wandb_job_commit, f"Current commit {current_commit} is not equal the job commit {args.wandb_job_commit}!"
if current_commit != args.wandb_job_commit:
print( f"Current commit {current_commit} is not equal the job commit {args.wandb_job_commit}!") # fmt: skip
try:
reset_commit(repo, args.wandb_job_commit)
except Exception as e:
print(e)
print("Trying to fetch the latest 20 commits ...")
origin = repo.remotes.origin
repo.git.fetch(origin, "--depth=20")
reset_commit(repo, args.wandb_job_commit)
restart_program()
else:
print( f"Current commit {current_commit} == job commit {args.wandb_job_commit}!") # fmt: skip
pass
if __name__=="__main__":
parser = argparse.ArgumentParser()
parser.add_argument( "--wandb-job-commit", type=str, default=None, help="validating the commit hexsha") # fmt: skip
args=parser.parse_args()
args.workspace = os.path.dirname(os.path.abspath(__file__))
prerun(args)
pass
# main codes
The codes perform the following actions:
I anticipate more elegant solutions!
WandB Internal User commented: zfhxi commented:
Hello! Thank you for sending this information! Could you send a link to your workspace so we can look at it? Only
wandb
employees will be able to view your project if this is a private project.Also, could you verify that the launch job you created corresponds to the run id
avc8q10w
? Just to make sure that we are looking at the same run as the one created.
Thank you for your response. I've created a demo at https://github.com/zfhxi/test_wandb_launch_job
WandB Internal User commented: zfhxi commented: After hours of work, I've found this solution:
import os
import argparse
import subprocess
import sys
from git import Repo
def restart_program():
p = subprocess.Popen([sys.executable] + sys.argv)
p.wait()
print("Fininshed the sub program!")
sys.exit(0)
def reset_commit(repo, commit_id, workspace):
commit = repo.commit(commit_id)
repo.head.reset(commit=commit, index=True, working_tree=True)
print( f"Workspace {workspace} is checkouting to {commit_id} ...")
def prerun(args):
# Confirming if the current branch matches the specific job commit
if bool(args.wandb_job_commit):
repo = Repo(args.workspace)
current_commit = repo.head.commit.hexsha
# assert current_commit == args.wandb_job_commit, f"Current commit {current_commit} is not equal the job commit {args.wandb_job_commit}!"
if current_commit != args.wandb_job_commit:
print( f"Current commit {current_commit} is not equal the job commit {args.wandb_job_commit}!") # fmt: skip
try:
reset_commit(repo, args.wandb_job_commit)
except Exception as e:
print(e)
print("Trying to fetch the latest 20 commits ...")
origin = repo.remotes.origin
repo.git.fetch(origin, "--depth=20")
reset_commit(repo, args.wandb_job_commit)
restart_program()
else:
print( f"Current commit {current_commit} == job commit {args.wandb_job_commit}!") # fmt: skip
pass
if __name__=="__main__":
parser = argparse.ArgumentParser()
parser.add_argument( "--wandb-job-commit", type=str, default=None, help="validating the commit hexsha") # fmt: skip
args=parser.parse_args()
args.workspace = os.path.dirname(os.path.abspath(__file__))
prerun(args)
pass
# main codes
The codes perform the following actions:
I anticipate more elegant solutions!
I created a job using wandb local:
the wandb local created the job in the TEST project, and the
wandb-job.json
:After that, I had modifed my codes and synced with remote repository, and the commits are as following:
Then, I launched the job by pushing it to the existing queue:
After completing the run, I located the codes cloned from a remote repository by the wandb local server and reviewed the commit:
The expected commit, as specified by
--git-hash
, should beb7baca74dd034cb900ea0e3f48c397ea51c4c481
rather than the HEAD commit!The above information indicates that:
--git-hash
option inwandb job create
seems to be not working.Can anyone help solve this?