simonw / git-history

Tools for analyzing Git history using SQLite
Apache License 2.0
191 stars 18 forks source link

Floating point numbers seem to always be recorded as changed #56

Open simonw opened 2 years ago

simonw commented 2 years ago

In this example:

image

I don't think latitude and longitude should be populated as they have not changed between records (unlike units).

This is from a demo database built against https://github.com/simonw/scrape-san-mateo-fire-dispatch with:

git-history file history.db incidents.json --id id

Relevant code:

https://github.com/simonw/git-history/blob/ce9e2f161f8037aab8f15dcffb4c7ff8f94ab3b4/git_history/cli.py#L344-L354

simonw commented 2 years ago

I ran a debugger and it looks like one value is a float and the other is as string:

(Pdb) value
37.6426283504007
(Pdb) previous_item.get(column)
'37.6426283504007'
simonw commented 2 years ago

The problem is that previous_item comes from the database:

https://github.com/simonw/git-history/blob/ce9e2f161f8037aab8f15dcffb4c7ff8f94ab3b4/git_history/cli.py#L430-L444

And the item_table schema for this database is:

CREATE TABLE [item_version] (
   [_id] INTEGER PRIMARY KEY,
   [_item] INTEGER REFERENCES [item]([_id]),
   [_version] INTEGER,
   [_commit] INTEGER REFERENCES [commits]([id]),
   [id] TEXT,
   [date] TEXT,
   [time] TEXT,
   [summary] TEXT,
   [category] TEXT,
   [location] TEXT,
   [latitude] TEXT,
   [longitude] TEXT,
   [units] TEXT,
   [_item_full_hash] TEXT
);
simonw commented 2 years ago

This seems to fix it:

diff --git a/git_history/cli.py b/git_history/cli.py
index f3a4c40..b05d345 100644
--- a/git_history/cli.py
+++ b/git_history/cli.py
@@ -349,7 +349,7 @@ def file(
                                     if column in RESERVED_SET:
                                         continue
                                     value = item_flattened.get(column)
-                                    if value != previous_item.get(column):
+                                    if str(value) != str(previous_item.get(column)):
                                         updated_values[column] = value
                                         updated_columns.add(column)
                             else:

Needs a test. More importantly though, I don't understand why this database schema has TEXT for every column.