tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
337 stars 33 forks source link

Investigate and fix LFS client error on SFPI submodule checkout #9561

Open tt-rkim opened 1 month ago

tt-rkim commented 1 month ago

After @pgkeller and @warthog9 moved the sfpi-rel repo to the tenstorrent org with no expected hiccups, suddenly GitHub runner jobs started failing with

  Downloading compiler/bin/riscv32-unknown-elf-addr2line (993 KB)
  Error downloading object: compiler/bin/riscv32-unknown-elf-addr2line (999e310): Smudge error: Error downloading compiler/bin/riscv32-unknown-elf-addr2line (999e310a03add500f94fbe9dcc0369c03f61a711c8a19bf1c97af3c972bf1e11): batch response: Client error: https://github.com/tenstorrent/sfpi-rel.git/info/lfs/objects/batch

  Errors logged to '/home/runner/work/tt-metal/tt-metal/.git/modules/src/ckernels/sfpi/lfs/logs/20240619T200929.304824887.log'.
  Use `git lfs logs last` to view the log.
  Error: error: external filter 'git-lfs filter-process' failed
  Error: fatal: compiler/bin/riscv32-unknown-elf-addr2line: smudge filter lfs failed
  Error: fatal: Unable to checkout 'f050df206be4da5e898cfb7aed1c7465997d77aa' in submodule path 'tt_metal/third_party/sfpi'
  Error: The process '/usr/bin/git' failed with exit code 128

example: https://github.com/tenstorrent/tt-metal/actions/runs/9587621223/job/26438029464

git lfs logs last did not show anything further helpful.

Eventually @warthog9 suspected git LFS was getting confused about the new sfpi-rel URL for its LFS objects, but the mismatch between that and what's specified in .gitmodules for SFPI. That was the problem. We need to update the remote in .gitmodules in downstream repos of sfpi-rel.

Probably same in Buda.

Need to ensure we let GitHub know and write an issue for https://github.com/actions/checkout

cc: @TT-billteng @ttmchiou @vtangTT

tt-rkim commented 1 month ago

Leaving open now until we submit an issue to GitHub.