iterative / studio-support

❓ DVC Studio Issues, Question, and Discussions
https://studio.iterative.ai
16 stars 1 forks source link

Registering new version of model hung in "registration is queued by ..." state #99

Closed h-joshi closed 2 months ago

h-joshi commented 2 months ago

This is in DVC Studio / model registry

image (It has been like this for at least 30 min now)

It's hard to tell whether the registration has crashed midway or it's truly just queued.

h-joshi commented 2 months ago

This is the commit history

the model name was called 'dna' and this appeared as an artefact in dvc.yaml

Initially, I was trying to register an updated version of the model, 1.0.5

I have since, remove the model altogether (using the deprecate option). I then removed all git tags from the repository to disassociate it from the model registry.

After that, I attempted to add the model again using the same model name 'dna', with the hope that it would register and I could add each model based on a given commit. However, now the model registry refused to add a model with the name 'dna'.

a08d735ae52895baf426c2960f6a6dbffd0d8fa2 (HEAD -> main, tag: dna@v1.0.5#9, origin/main, origin/HEAD) Updated NIF ML model training data and model (#9)
040b59bac6dd2dd403b0b55c450c5b4baed5c223 (tag: dna@v1.0.4#8) Model refit without feature names (#8)
2a190cb672c0093fb310ed5652fb149ad11880f4 (tag: dna@v1.0.3#7) Fixed metrics folder location
9cf185cad1566ec5f7fb32a9bdf25ab151ded236 (tag: dna@v1.0.2#6) Updated model (#7)
c1723d8e98b3fd9a2b5544ca95f8c0d91ebab014 (tag: dna@v1.0.1#5) Hotfix - Pinned pygit2 version
93194915bd2b62f279d238ea96e03b2b9769ce65 Store predictions data for test, validation and synthetic datasets (#6)
4e81e3aa98fc5db2f48b0b2c5821de2b18abfbd3 Update ML model data and added Odds Ratios (#5)
8eed1d2cac0ce1756a3d3fe5be4205b5e4723fd6 S3 bucket change (#3)

#### At this point, the s3:// location changed (this was reflected in .dvc/config) ####

d6e94bd75807f5c49f357bee11929d28670fd72a Added pipeline status to README.md
cdc1eb0bfab22c42f05c08cd28c529d1f6bb006f Github workflow to verify pipeline (#2)
83791e1ab3f819c373258c9494fe96426892f536 Fixed typo in requirements.txt
8d124f7968e0719283cc89863d07eab96dffb521 (tag: dna@v1.0.0#1, tag: dna#test#4, tag: dna#prod#2, tag: dna#dev#3) Add dna model from model/model.pkl
7589387e84e87688069b3510ed9454d13d5493ab (tag: model@v1.0.0#1, tag: model@deprecated#4, tag: model#test#3, tag: model#dev#2) Feature/initial setup (#1)
e60549c086076b7f2eb9c7397de9bee2aa2cb50f Initial commit
ssachkovskaya commented 2 months ago

Thanks for the details @h-joshi! We are investigating your issue, but meanwhile could you please try going through the following steps and then check your models again?

  1. Go to the Projects page

  2. Find the project for the repository with your models and click on it (example-get-started in my case)

  3. Manually re-import the project and then wait till the import is finished


At some point, the s3 location changed

cc @amritghimire

h-joshi commented 2 months ago

Hi @ssachkovskaya, I've tried the above and it didn't work.

I've created a publicly accessible github repo and DVC project (links below) and outlined the steps to reproduce the problem. Hopefully this helps.

1. Created new repo https://github.com/geneie-org/sample_model

2. Imported into DVC studio https://studio.dvc.ai/user/hjusyd/projects/sample_model-l589vxgz16

3. Added model image

4. Registered new version image

image

5. Now deprecate the model image

6. Remove all DVC versioning tags from the git repository

Before tag removal

[main] /Users/himanshujoshi/Projects/sample_model
√ % git tag
model@deprecated#2
model@v1.0.0#1
[main] /Users/himanshujoshi/Projects/sample_model
√ % git one
0d36bc6282fb4637af2577c7414fabbce1691778 (HEAD -> main, tag: model@v1.0.0#1, tag: model@deprecated#2, origin/main, origin/HEAD) Add model model from model/model.pkl
b09dd7a273b012c490c0baf1be62bff111639e53 Added model
efde6d254c64f5923ed0f036406f8e1df7f32af0 Initialize DVC
0b7fbcab564a24a8b7ff87deaedb029764fe6160 Initial commit

Remove tags

[main] /Users/himanshujoshi/Projects/sample_model
√ % git tag | xargs -L 1 | xargs git push origin --delete
remote: This repository moved. Please use the new location:
remote:   git@github.com:geneie-org/sample_model.git
To github.com:h-joshi/sample_model.git
 - [deleted]         model@deprecated#2
 - [deleted]         model@v1.0.0#1
[main] /Users/himanshujoshi/Projects/sample_model
√ % git tag | xargs -L 1 | xargs git tag --delete
Deleted tag 'model@deprecated#2' (was 3c5fc5c)
Deleted tag 'model@v1.0.0#1' (was 86fa36f)

After tags removal

[main] /Users/himanshujoshi/Projects/sample_model
√ % git tag
[main] /Users/himanshujoshi/Projects/sample_model
√ % git one
0d36bc6282fb4637af2577c7414fabbce1691778 (HEAD -> main, origin/main, origin/HEAD) Add model model from model/model.pkl
b09dd7a273b012c490c0baf1be62bff111639e53 Added model
efde6d254c64f5923ed0f036406f8e1df7f32af0 Initialize DVC
0b7fbcab564a24a8b7ff87deaedb029764fe6160 Initial commit
[main] /Users/himanshujoshi/Projects/sample_model

7. Now add the model again using the same model name "model" image

Model registration in progress image

Failure image

ssachkovskaya commented 2 months ago

Thanks for steps to reproduce the issue! It is very helpful 🔥

I was able to catch the same issue when tried to re-add a deprecated model. Studio may have some issues with updating the model state when manually removing some tags from the repository. However, I was able to restore the model by running Re-import from Git for the corresponding project (mentioned in this comment). The model re-appeared in the list with no versions and I was able to register a new one afterwards.

I've tried the above and it didn't work.

So you are saying that the model didn't appear in your table after running re-import, right? If so, can you please confirm that your repo had no tags and your model was still a part of dvc.yaml artifacts?

h-joshi commented 2 months ago

Hi @ssachkovskaya

I've just now removed all the tags and retried importing the repo and have been able to rebuild the tag history 😄 I think earlier, I may not have removed all the tags properly.

Thanks heaps for your help.

If someone has issues with registering a new model version and wants to set everything up from scratch, a FAQ might be handy that lets the users know how to capture/clear existing tags and carry out model re-registration. The steps could be along these lines

  1. Obtain tag history

    √ % git show-ref --tags
    2742853c88501eed1ab4355e3960594ca6896d17 refs/tags/dna@v1.0.4#2
    3a740f18204b685ca7954a24c2a933e86e71df0f refs/tags/dna@v1.0.5#1
  2. Remove remote tags

    git tag | xargs -L 1 | xargs git push origin --delete
  3. Remove local tags

    git tag | xargs -L 1 | xargs git tag --delete
  4. Reimport repository <Reimport step as you've shown in the screenshot>

  5. Re-register model and register model versions (based on captured tag history in step 1)

ssachkovskaya commented 2 months ago

@h-joshi I am glad the problem has been resolved for you 🔥

@shcheklein @dberenbaum please take a look at this suggestion for Model Registry docs.