Closed SimonHallen closed 9 months ago
There seem to be a crazy amount of edits. Im not sure why. My guess is that the file has some wrong formating somehow. We need to fix this. @BobBorges do you see what whent wrong?
Yes and the chair test also fails.
Simon added a column -- I didn't anticipate that / warn about it. I will fetch this file and somehow try to see only the meaningful changes + removing the extra column.
Maybe my git-comma-diff would come in handy?
git diff --word-diff-regex=[^[:space:],]+ $argv
@fredrik1984 50-random-edited-rows.csv please have a look
? The test still fails? We want to fix that first?
The test fails because the delimiter changed so it doesn't find the columns. I want to make sure that we can see what Simon actually changed and that those edits are reasonable (hence the csv w/ tagged Fredrik) before I start messing with the file. Right now the edits look reasonable, but I'd like a second opinion, then it's just a few minutes to fix the formatting.
It is always better to fix the tests first. Otherwise, we might find new bugs after @fredrik1984 has done his checks.
Why not just remove the column? You could do a separate PR to see the diffs?
@fredrik1984 -- chairs test, as it was before simon's work passes.
Yay!
Don't merge yet...
The tests that were already running on the chairs data are now passing, but there are three skipped tests that still fail. I think we can merge the data as it is and potentially continue fixing inaccuracies. What do you all say? @MansMeg @ninpnin
@SimonHallen I'm attaching some files from unit tests here. Do you think you will have time to look at some of these?
Also @LaurineMir went through some of the places where the matriklar didn't line up with the Swerik metadata and I'm also attaching a list of issues that might be related to OCR, like the year is wrong and in some cases the MP was long dead by the time of the seat datum. Maybe you could have a look at some of those, as it's relevant for the result of your project.
20240206-1551_ChairHogs.csv 20240206-1551_EmptySeats.csv 20240206-1551_LoveSeats.csv probable_ocr-err_in_matriklarna.csv
I'll take a look at it!
review of chairs document. Finding duplicates etc