src-d / identity-matching

source{d} extension to match Git signatures to real people.
GNU General Public License v3.0
17 stars 13 forks source link

panic: json: unsupported value: NaN #69

Closed carlosms closed 5 years ago

carlosms commented 5 years ago

Using current master (bbaf008) from observability demo repository, and the organization carlosms-test-org, it fails with this log:

identity-matching_1      | time="2019-10-03T08:54:47Z" level=info msg="Using caching for external matching" cachePath=cache-external.csv
identity-matching_1      | time="2019-10-03T08:54:47Z" level=info msg="Dumping CachedMatcher cache."
identity-matching_1      | time="2019-10-03T08:54:47Z" level=info msg="looking for people in commits"
identity-matching_1      | time="2019-10-03T08:54:47Z" level=info msg="not cached in cache-raw.csv, loading from the database"
identity-matching_1      | time="2019-10-03T08:54:47Z" level=info msg="caching the result to cache-raw.csv"
identity-matching_1      | time="2019-10-03T08:54:47Z" level=info msg="found people" elapsed=9.72824ms people=1
identity-matching_1      | time="2019-10-03T08:54:47Z" level=info msg="reducing people"
identity-matching_1      | time="2019-10-03T08:54:48Z" level=warning msg="unable to find users for email: carlos.martin.sanchez@gmail.com"
identity-matching_1      | time="2019-10-03T08:54:48Z" level=warning msg="no matches for person :carlos martin||carlos.martin.sanchez@gmail.com"
identity-matching_1      | time="2019-10-03T08:54:48Z" level=info msg="reduced people" elapsed=889.145294ms people=1
identity-matching_1      | time="2019-10-03T08:54:48Z" level=info msg="primary names are set" elapsed="2.417µs"
identity-matching_1      | time="2019-10-03T08:54:48Z" level=info msg="primary emails are set" elapsed="1.465µs"
identity-matching_1      | time="2019-10-03T08:54:48Z" level=info msg="storing people"
identity-matching_1      | time="2019-10-03T08:54:48Z" level=info msg="stored people" elapsed=5.782553ms path=identities
identity-matching_1      | panic: json: unsupported value: NaN
identity-matching_1      | 
identity-matching_1      | goroutine 1 [running]:
identity-matching_1      | github.com/src-d/identity-matching/reporter.Write()
identity-matching_1      |  /go/src/identity-matching/reporter/reporter.go:38 +0x10b
identity-matching_1      | main.main()
identity-matching_1      |  /go/src/identity-matching/cmd/match-identities/main.go:111 +0x120a
vmarkovtsev commented 5 years ago

Fixing this today!

vmarkovtsev commented 5 years ago

@carlosms Can you please send me cache-raw.csv and cache-external.csv

vmarkovtsev commented 5 years ago

Actually, I have identified the problem, not required.

vmarkovtsev commented 5 years ago

Please confirm and I will close the issue.

carlosms commented 5 years ago

Same problem, same error trace.

The contents of cache-raw.csv:

repo,name,email
github.com/carlosms-test-org/lookout-test,Carlos Martín,carlosms@users.noreply.github.com
github.com/carlosms-test-org/lookout-test,Carlos Martín,carlosms@users.noreply.github.com
github.com/carlosms-test-org/lookout-test,Carlos Martín,carlosms@users.noreply.github.com
github.com/carlosms-test-org/test-repo,Carlos Martín,carlosms@users.noreply.github.com
github.com/carlosms-test-org/lookout-test,Carlos Martín,carlosms@users.noreply.github.com
github.com/carlosms-test-org/test-repo,Carlos Martín,carlos.martin.sanchez@gmail.com
github.com/carlosms-test-org/lookout-test,Carlos Martín,carlosms@users.noreply.github.com
github.com/carlosms-test-org/lookout-test,Carlos Martín,carlosms@users.noreply.github.com
github.com/carlosms-test-org/test-repo,Carlos Martín,carlosms@users.noreply.github.com
github.com/carlosms-test-org/test-repo,Carlos Martín,carlosms@users.noreply.github.com
github.com/carlosms-test-org/lookout-test,Carlos Martín,carlosms@users.noreply.github.com
github.com/carlosms-test-org/lookout-test,Carlos Martín,carlosms@users.noreply.github.com
github.com/carlosms-test-org/lookout-test,Carlos Martín,carlosms@users.noreply.github.com
github.com/carlosms-test-org/lookout-test,Carlos Martín,carlosms@users.noreply.github.com
github.com/carlosms-test-org/lookout-test,Carlos Martín,carlosms@users.noreply.github.com
github.com/carlosms-test-org/lookout-test,Carlos Martín,carlosms@users.noreply.github.com
github.com/carlosms-test-org/lookout-test,Carlos Martín,carlosms@users.noreply.github.com
github.com/carlosms-test-org/lookout-test,Carlos Martín,carlosms@users.noreply.github.com
github.com/carlosms-test-org/lookout-test,Carlos Martín,carlosms@users.noreply.github.com

cache-external.csv:

email,user,name,match
vmarkovtsev commented 5 years ago

Are you sure you've updated? The error trace cannot be the same, I have changed the panic text heavily.

carlosms commented 5 years ago

Yes, using this commit

commit b7c6e8d34e7f791cea33d4683c5120b529f66f08
Merge: bbaf008 75c1674
Author: Vadim Markovtsev <vadim@sourced.tech>
Date:   Thu Oct 3 13:40:02 2019 +0200

    Merge pull request #70 from vmarkovtsev/master

    Handle empty results

But looking at the changes, it looks like these condition can never be true: f != f mean != mean

carlosms commented 5 years ago

Oh wait, the docker was built in multi stages and the binary itself was not built with the last commit. After forcing to rebuild, this is the new output:

identity-matching_1      | time="2019-10-03T12:21:45Z" level=info msg="Using caching for external matching" cachePath=cache-external.csv
identity-matching_1      | time="2019-10-03T12:21:45Z" level=info msg="Dumping CachedMatcher cache."
identity-matching_1      | time="2019-10-03T12:21:45Z" level=info msg="looking for people in commits"
identity-matching_1      | time="2019-10-03T12:21:45Z" level=info msg="not cached in cache-raw.csv, loading from the database"
identity-matching_1   time="2019-10-03T12:21:45Z" level=info msg="caching the result to cache-raw.csv"
identity-matching_1      | time="2019-10-03T12:21:45Z" level=info msg="found people" elapsed=16.707213ms people=1
identity-matching_1      | time="2019-10-03T12:21:45Z" level=info msg="reducing people"
identity-matching_1      | time="2019-10-03T12:21:46Z" level=warning msg="unable to find users for email: carlos.martin.sanchez@gmail.com"
identity-matching_1      | time="2019-10-03T12:21:46Z" level=warning msg="no matches for person :carlos martin||carlos.martin.sanchez@gmail.com"
identity-matching_1      | time="2019-10-03T12:21:46Z" level=panic msg="Commit(\"connected component size std\", NaN)"
identity-matching_1      | panic: (*logrus.Entry) (0x85f5a0,0xc00006e540)
identity-matching_1      | 
identity-matching_1      | goroutine 1 [running]:
identity-matching_1      | github.com/sirupsen/logrus.Entry.log(0xc0000aa120, 0xc00074a150, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
identity-matching_1      |  /go/pkg/mod/github.com/sirupsen/logrus@v1.3.0/entry.go:227 +0x2ce
identity-matching_1      | github.com/sirupsen/logrus.(*Entry).Log(0xc00006e3c0, 0x0, 0xc000144dc0, 0x1, 0x1)
identity-matching_1      |  /go/pkg/mod/github.com/sirupsen/logrus@v1.3.0/entry.go:256 +0xe4
identity-matching_1      | github.com/sirupsen/logrus.(*Entry).Logf(0xc00006e3c0, 0xc000000000, 0x86e9ea, 0x10, 0xc000144e80, 0x2, 0x2)
identity-matching_1      |  /go/pkg/mod/github.com/sirupsen/logrus@v1.3.0/entry.go:301 +0xc5
identity-matching_1      | github.com/sirupsen/logrus.(*Logger).Logf(0xc0000aa120, 0xc000000000, 0x86e9ea, 0x10, 0xc000144e80, 0x2, 0x2)
identity-matching_1      |  /go/pkg/mod/github.com/sirupsen/logrus@v1.3.0/logger.go:137 +0x96
identity-matching_1      | github.com/sirupsen/logrus.(*Logger).Panicf(...)
identity-matching_1      |  /go/pkg/mod/github.com/sirupsen/logrus@v1.3.0/logger.go:178
identity-matching_1      | github.com/sirupsen/logrus.Panicf(...)
identity-matching_1      |  /go/pkg/mod/github.com/sirupsen/logrus@v1.3.0/exported.go:168
identity-matching_1      | github.com/src-d/identity-matching/reporter.Commit(0x874dd4, 0x1c, 0x7c67a0, 0xc00047ef50)
identity-matching_1      |  /go/src/identity-matching/reporter/reporter.go:19 +0x1c6
identity-matching_1      | github.com/src-d/identity-matching.ReducePeople(0xc00017a3f0, 0x8f73c0, 0xc0000241c0, 0xc00017a2a0, 0xc00017a480, 0xc00017a720, 0xc00017b620, 0xc000334150, 0xc000334c90, 0x14, ...)
identity-matching_1      |  /go/src/identity-matching/matching.go:215 +0x1b16
identity-matching_1      | main.main()
identity-matching_1      |  /go/src/identity-matching/cmd/match-identities/main.go:81 +0x886
vmarkovtsev commented 5 years ago

it looks like these condition can never be true:

This is how you check NaN values

vmarkovtsev commented 5 years ago

So checked the mean but it is fine, the standard deviation was buggy. Fixing.

carlosms commented 5 years ago

it looks like these condition can never be true:

This is how you check NaN values

Didn't know this trick, thanks.

vmarkovtsev commented 5 years ago

I hope this is finally fixed.

carlosms commented 5 years ago

Yes, now it's running fine, thank you for the quick fix.