iterative / studio-support

❓ DVC Studio Issues, Question, and Discussions
https://studio.iterative.ai
16 stars 1 forks source link

Get wrong GCS bucket URL in iterative studio dashboard #62

Closed mehadi92 closed 1 year ago

mehadi92 commented 1 year ago

Hi, I'm using a separate GCS bucket to store data and models. I'm using this command to upload my model file to my GCS bucket

 dvc add train_outputs/final-model.pt
 dvc push train_outputs/final-model.pt -r gcs_model_store

Here gcs_model_store is my GCS bucket name that I configure in .dvc/config file. It successfully uploads the model in my GCS bucket. But In the iterative studio shows the wrong (dataset gcs bucket name) bucket name.

Here is the screen shoot from iterative studio Screenshot from 2022-10-03 12-25-16

Here is my .dvc/config file

[core]
    remote = gcs
['remote "gcs"']
    url = gs://dvc_multiconer_dataset
['remote "gcs_model_store"']
    url = gs://dvc_models_ner_multiconer

Here is my GitHub action

training:
    name: Training and Reporting
    needs:
      - deploy-runner
    runs-on: [self-hosted, gpu_runner]
    steps:
      - uses: actions/checkout@v2
      - uses: iterative/setup-cml@v1
      - uses: iterative/setup-dvc@v1
      - uses: actions/setup-python@v2
        with:
          python-version: '3.8'
      - name: 'Authenticate to Google Cloud'
        uses: 'google-github-actions/auth@v0'
        with:
          credentials_json: ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS_DATA }}
      - name: 'Install requirements'
        run: |
             pip install -r requirements.txt
      - name: Training
        env:
          repo_token: ${{ secrets.REPO_ACCESS_TOKEN }}
        run: |
          # Pull dataset with DVC
          dvc pull data

          # Reproduce pipeline if any changes detected in dependencies
          dvc repro

          # Upload model go GCS bucket
          ls -al train_outputs
          git status
          dvc add train_outputs/final-model.pt
          dvc push train_outputs/final-model.pt -r gcs_model_store
      - name: Create report
        env:
          repo_token: ${{ secrets.REPO_ACCESS_TOKEN }}
        run: |
          ls train_outputs/*

          CODEBLOCK_START="\`\`\`sh"
          CODEBLOCK_END="\`\`\`"

          echo "## NER report" > report.md

          echo "### Loss" >> report.md
          echo $CODEBLOCK_START >> report.md
          cat train_outputs/loss.tsv >> report.md
          printf "\n" >> report.md
          echo $CODEBLOCK_END >> report.md
          printf "\n" >> report.md

          cml pr --skip-ci train_outputs/final-model.pt.dvc dvc.lock report.md evaluation.json

Is there anything I'm missing?

Thanks

dacbd commented 1 year ago

What you are doing looks correct to me. Can you confirm that the object does not exsist in gs://dvc_multiconer_dataset and is only in gs://dvc_models_ner_multiconer we may transfer this issue to the studio team.

mehadi92 commented 1 year ago

@dacbd I have checked the model file object does not exist in gs://dvc_multiconer_dataset but it exist in the gs://dvc_models_ner_multiconer

tapadipti commented 1 year ago

@mehadi92 Thanks for reporting this. We will look into it and get back to you.

mvshmakov commented 1 year ago

This is addressed and will be deployed on the next week. @mehadi92 thanks for the report!