m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.73k stars 1.35k forks source link

Question about PyPi releases #700

Open MarieRoald opened 9 months ago

MarieRoald commented 9 months ago

Hello! I'm trying to understand the releases on PyPi. PyPi lists two releases: 3.1.2 and 3.1.1 (https://pypi.org/project/whisperx/), both published on February 6th this year. But here on GitHub, the latest release was 3.1.1 (May 13th 2023). Are the versions on PyPi official releases by the WhisperX team?

gillens commented 9 months ago

I have no idea myself but I just diffed the PyPi 3.1.2 release with this repo. At first I thought that the 3.1.2 release on there was simply the latest commit on the main branch when they uploaded it, but they seem to have changed some small things. It's based off commit 06e30b2a2590bdf093a5aa40699ef71c174916e1 from Jan 1 2024. From there, they include these currently-open PRs:

And these other changes:

I assume the upload is unofficial as the PyPi maintainer does not have commits to this repository, at least under their linked GitHub account.

Used this script to find the commit with the smallest diff to the extracted PyPi tar:

Script ``` #!/usr/bin/env bash REPO_PATH="/tmp/whisperX" PYPI_PATH="/tmp/whisperx-3.1.2" cd "$REPO_PATH" BEST_MATCH="" MIN_DIFFS=1000000 # Arbitrarily large number # Iterate over commits for commit in $(git rev-list --all --max-count=50); do # Check out the commit git checkout $commit &> /dev/null # Compare the commit against the PyPi package DIFFS=$(diff -urN --exclude=".git" --exclude="build" "$PYPI_PATH" "$REPO_PATH" | wc -l) echo "Commit $commit has $DIFFS differences" # Update the best match if this commit has fewer differences if [ $DIFFS -lt $MIN_DIFFS ]; then BEST_MATCH=$commit MIN_DIFFS=$DIFFS fi done echo "Best match: $BEST_MATCH with $MIN_DIFFS differences" git checkout $BEST_MATCH ```