Open simonw opened 2 years ago
I'd like to get rid of this code, this feature could help there: https://github.com/simonw/git-history/blob/8857ce199011a158caba547738d0ba352f3ca69b/git_history/cli.py#L264-L266
Options for the name:
--skip-error-commits
--skip-bad-commits
--skip-errors
- I think I like this one best, it's succinct but memorable enough--ignore-errors
--ignore-bad-commits
The related existing options (that this should be consistent with) are:
--ignore-duplicate-ids
--skip HASH
--start-at
--start-after
Maybe add a separate small section to the README about ways of skipping bad commits too.
FARA_All_Registrants.csv
may be a good file to develop this against.
Here's a prototype diff for this feature (though it should also capture exceptions raised by items = list(convert_function(content))
):
diff --git a/git_history/cli.py b/git_history/cli.py
index b119004..08a2f04 100644
--- a/git_history/cli.py
+++ b/git_history/cli.py
@@ -74,6 +74,9 @@ def cli():
@click.option(
"skip_hashes", "--skip", multiple=True, help="Skip specific commit hashes"
)
+@click.option(
+ "--skip-errors", is_flag=True, help="If a version has a parse error, skip it"
+)
@click.option(
"--full-versions",
is_flag=True,
@@ -134,6 +137,7 @@ def file(
start_at,
start_after,
skip_hashes,
+ skip_errors,
full_versions,
csv_,
dialect,
@@ -255,7 +259,13 @@ def file(
).keys()
)
# Validate all items in the commit have ID columns - raises ClickException if not
- validate_items_have_id_columns(items, ids, git_hash)
+ try:
+ validate_items_have_id_columns(items, ids, git_hash)
+ except click.ClickException:
+ if skip_errors:
+ continue
+ else:
+ raise
# Use this to detect IDs that are duplicated in the same commit
item_ids_seen_in_this_commit = set()
File "/Users/simon/Dropbox/Development/git-history/git_history/cli.py", line 233, in file
items = list(convert_function(content))
File "<string>", line 3, in fn
File "/Users/simon/.pyenv/versions/3.10.0/lib/python3.10/csv.py", line 187, in sniff
raise Error("Could not determine delimiter")
_csv.Error: Could not determine delimiter
It would be good if this was caught and the commit hash was shown in the error message.
I could add an option that skips any commits that don't parse correctly with the provided conversion function. This would be the quickest possible way of parsing a history where all but a few commits work.