logv / snorkel

UI for interactive data analysis | https://snorkel.logv.org
https://fb.com/groups/snorkelsnorkelsnorkel
161 stars 21 forks source link

Snorkel lite fails when supplied with -int-filter #45

Closed alfa07 closed 4 years ago

alfa07 commented 4 years ago

Repro: 1/ Generate data with https://gist.github.com/alfa07/ec5d1901d4c21be4dd093fe0fec1869b

$ gen_data.py --n 100000 --D 50 --output-file sphere.json

2/ Ingest data 3/ Run query (it was generated by snorkel):

$ snorkel/backend/bin/sybil query -json --read-log --cache-queries --field-separator= --filter-separator= -table sphere_100k_50D_t -samples -int-filter timegt1583437692timelt1584038892 -limit 10

4/ It fails with:

snorkel/backend/bin/sybil query -json --read-log --cache-queries --field-separator= --filter-separator= -table sphere_100k_50D_t -samples -int-filter timegt1583437692timelt1584038892 -limit 10
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/logv/sybil/src/lib.BuildFilters(0xc4200c2000, 0xc4200d0080, 0x7ffe2cda5014, 0x20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/okay/tonka/src/snorkel.pudgy/build/go/src/github.com/logv/sybil/src/lib/filter.go:43 +0xdf5
github.com/logv/sybil/src/cmd.runQueryCmdLine()
        /home/okay/tonka/src/snorkel.pudgy/build/go/src/github.com/logv/sybil/src/cmd/cmd_query.go:183 +0xe64
github.com/logv/sybil/src/cmd.RunQueryCmdLine()
        /home/okay/tonka/src/snorkel.pudgy/build/go/src/github.com/logv/sybil/src/cmd/cmd_query.go:71 +0x2f
main.main()
        /home/okay/tonka/src/snorkel.pudgy/build/go/src/github.com/logv/sybil/main.go:95 +0xd1

5/ Without -int-filter it runs fine

okayzed commented 4 years ago

thanks, will investigate and fix ASAP

okayzed commented 4 years ago

Did this panic happen inside snorkel when running a query or when you run it on command line? The issue happening here in specific is that snorkel is using non-printing ASCII characters for field separator and if you copy paste from snorkel's output to command line, you will not get good results.

What you can do is remove the '-field-separator' and 'filter-separator' args and add ':' and ',' to the filters.

sybil query -json --read-log --cache-queries -table sphere -samples -int-filter time:lt:1584038892,time:gt1583437692: -limit 10

alfa07 commented 4 years ago

Did this panic happen inside snorkel when running a query or when you run it on command line? The issue happening here in specific is that snorkel is using non-printing ASCII characters for field separator and if you copy paste from snorkel's output to command line, you will not get good results.

What you can do is remove the '-field-separator' and 'filter-separator' args and add ':' and ',' to the filters.

sybil query -json --read-log --cache-queries -table sphere -samples -int-filter time:lt:1584038892,time:gt1583437692: -limit 10

Inside snorkel and when I reproduced on the command line as well. Yes, I can make it work on the command line, but the workaround won't work for snorkel-lite?

I am getting errors like this in snorkel server log:

RUNNING COMMAND /home/msokolov/anaconda2/lib/python2.7/site-packages/snorkel/backend/bin/sybil query -json --read-log --cache-queries --field-separator= --filter-separator= -table sphere_100k_50D -int-filter timegt1583460231timelt1584061431 -limit 10
ERROR INVOKING QuerySidebar run_query No JSON object could be decoded
Traceback (most recent call last):
  File "/home/msokolov/anaconda2/lib/python2.7/site-packages/pudgy/blueprint.py", line 118, in invoke
    ret, proxy = found.invoke(cid, fn, args, kwargs)
  File "/home/msokolov/anaconda2/lib/python2.7/site-packages/pudgy/components/bridge.py", line 119, in invoke
    return cls.__remote_calls__[fn](*args, **kwargs), c
  File "/home/msokolov/anaconda2/lib/python2.7/site-packages/snorkel/auth.py", line 113, in wrapped_func
    r = f(*args, **kwargs)
  File "/home/msokolov/anaconda2/lib/python2.7/site-packages/snorkel/auth.py", line 102, in wrapped_func
    return func(*args, **kwargs)
  File "/home/msokolov/anaconda2/lib/python2.7/site-packages/snorkel/pages.py", line 208, in run_query
    res = bs.run_query(table, query, ti)
  File "/home/msokolov/anaconda2/lib/python2.7/site-packages/snorkel/backend/sybil.py", line 396, in run_query
    return q.run_query(table, query_spec, metadata)
  File "/home/msokolov/anaconda2/lib/python2.7/site-packages/snorkel/backend/sybil.py", line 128, in run_query
    return self.run_table_query(table, query_spec)
  File "/home/msokolov/anaconda2/lib/python2.7/site-packages/snorkel/backend/sybil.py", line 306, in run_table_query
    return run_query_command(cmd_args)
  File "/home/msokolov/anaconda2/lib/python2.7/site-packages/snorkel/backend/sybil.py", line 79, in run_query_command
    return json.loads(ret)
  File "/home/msokolov/anaconda2/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "/home/msokolov/anaconda2/lib/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/msokolov/anaconda2/lib/python2.7/json/decoder.py", line 382, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
alfa07 commented 4 years ago

I suspect this doesn't work: https://github.com/logv/snorkel/blob/7ab9c7ba3f6fa51b0d8466f7588321b6245b04ab/src/backend/sybil.py#L95-L98

okayzed commented 4 years ago

What's your OS if I may ask? It looks like a linux distro.

I think what is happening is related to the ASCII characters, but I'm not certain because I don't see the sybil error in the output. if you add "DEBUG=1" to the start of the command when running snorkel's frontend, it should print out sybil's stderr information. (like DEBUG=1 snorkel.frontend)

PS: Thank you for reporting this issue

okayzed commented 4 years ago

I suspect this doesn't work:

I also suspect it now ;)

alfa07 commented 4 years ago
$ cat /etc/centos-release
CentOS Linux release 7.7.1908 (Core)
$ uname -r
3.10.0-1062.12.1.el7.x86_64
alfa07 commented 4 years ago

I just git-cloned snorkel and ran PORT=2333 python -m src.main (before I installed it using pip install snorkel-lite). It seems to be working now. Queries go through fine:

RUNNING COMMAND src/backend/bin/sybil query -json --read-log --cache-queries --field-separator= --filter-separator= -table sphere -time-col time -time -group tag -int x0 -int-filter timegt1584060423timelt1584064023 -limit 10 -time-bucket 300
INGESTING 1 SAMPLES INTO slite@queries

Not sure what is happening.

But chr(30) and chr(31) still do not show up in the logs (could it be a terminal issue?)

okayzed commented 4 years ago

They are non printing ASCII characters, they shouldn't show up in the logs.

In ASCII, 30 = 'record separator' and 31 = 'file separator' . They are special characters meant for separating fields.

I will rebuild snorkel-lite and upload a package. Thanks for cloning and checking that it works! What is sphere data representing, btw (if you don't mind me asking)? Please feel free to email, open issues or post in group for any guidance/help.