simonw / git-history

Tools for analyzing Git history using SQLite
Apache License 2.0
191 stars 18 forks source link

Idea: `--skip-errors` option for skipping commits that rase an exception in the convert function #34

Open simonw opened 2 years ago

simonw commented 2 years ago

I could add an option that skips any commits that don't parse correctly with the provided conversion function. This would be the quickest possible way of parsing a history where all but a few commits work.

simonw commented 2 years ago

I'd like to get rid of this code, this feature could help there: https://github.com/simonw/git-history/blob/8857ce199011a158caba547738d0ba352f3ca69b/git_history/cli.py#L264-L266

simonw commented 2 years ago

Options for the name:

The related existing options (that this should be consistent with) are:

Maybe add a separate small section to the README about ways of skipping bad commits too.

simonw commented 2 years ago

FARA_All_Registrants.csv may be a good file to develop this against.

Here's a prototype diff for this feature (though it should also capture exceptions raised by items = list(convert_function(content))):

diff --git a/git_history/cli.py b/git_history/cli.py
index b119004..08a2f04 100644
--- a/git_history/cli.py
+++ b/git_history/cli.py
@@ -74,6 +74,9 @@ def cli():
 @click.option(
     "skip_hashes", "--skip", multiple=True, help="Skip specific commit hashes"
 )
+@click.option(
+    "--skip-errors", is_flag=True, help="If a version has a parse error, skip it"
+)
 @click.option(
     "--full-versions",
     is_flag=True,
@@ -134,6 +137,7 @@ def file(
     start_at,
     start_after,
     skip_hashes,
+    skip_errors,
     full_versions,
     csv_,
     dialect,
@@ -255,7 +259,13 @@ def file(
                     ).keys()
                 )
                 # Validate all items in the commit have ID columns - raises ClickException if not
-                validate_items_have_id_columns(items, ids, git_hash)
+                try:
+                    validate_items_have_id_columns(items, ids, git_hash)
+                except click.ClickException:
+                    if skip_errors:
+                        continue
+                    else:
+                        raise

                 # Use this to detect IDs that are duplicated in the same commit
                 item_ids_seen_in_this_commit = set()
simonw commented 2 years ago
  File "/Users/simon/Dropbox/Development/git-history/git_history/cli.py", line 233, in file
    items = list(convert_function(content))
  File "<string>", line 3, in fn
  File "/Users/simon/.pyenv/versions/3.10.0/lib/python3.10/csv.py", line 187, in sniff
    raise Error("Could not determine delimiter")
_csv.Error: Could not determine delimiter

It would be good if this was caught and the commit hash was shown in the error message.