hideaki-t / sqlite-fts-python

A Python binding of SQLite Full Text Search Tokenizer
MIT License
45 stars 11 forks source link

no such function fts3_tokenizer #6

Closed ramcharran closed 7 years ago

ramcharran commented 7 years ago

The following is the output i got when i executed my code. I have compiled my code with the flag DSQLITE_ENABLE_FTS3_TOKENIZER. But i still got this error.

hideaki-t commented 7 years ago

can you check following 2 things?

Note: I assumed you are using a Linux environment. commands below are just examples. anything is fine if it provides information I want to know...

ramcharran commented 7 years ago

I am running my application on alpine linux on docker here are the outputs for both of the commands

hideaki-t commented 7 years ago

hmm. Does your SQLite have FTS3 function? did you pass an argument --enable-fts3 to the configure script when you built sqlite3?

if 2 args fts3_tokenizer is disabled, you should get sqlite3.OperationalError: fts3tokenize disabled. so I guess your SQLite does not have FTS3 functionality.

I found enabling 2 args fts3_tokenizer using sqlite3_db_config is just a flag operation, so it returns True even FTS3 is not enabled.

can you try the followings? check compilation flags:

$ python -c 'import sqlite3 as s; import pprint as pp; r = s.connect(":memory:").execute("pragma compile_options").fetchall(); pp.pprint(r); print(s.sqlite_version)'
[('COMPILER=gcc-7.1.1 20170516',),
 ('DEFAULT_SYNCHRONOUS=2',),
 ('DEFAULT_WAL_SYNCHRONOUS=2',),
 ('ENABLE_COLUMN_METADATA',),
 ('ENABLE_DBSTAT_VTAB',),
 ('ENABLE_FTS3',),
 ('ENABLE_FTS4',),
 ('ENABLE_FTS5',),
 ('ENABLE_JSON1',),
 ('ENABLE_RTREE',),
 ('ENABLE_UNLOCK_NOTIFY',),
 ('HAVE_ISNAN',),
 ('SECURE_DELETE',),
 ('SYSTEM_MALLOC',),
 ('TEMP_STORE=1',),
 ('THREADSAFE=1',)]
3.19.3

using libsqlite3.so built with -DSQLITE_ENABLE_FTS3_TOKENIZER=1 but without --enable-fts3

$ python
Python 3.6.1 (default, Mar 27 2017, 00:27:06)
[GCC 6.3.1 20170306] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> c = sqlite3.connect(':memory:')
>>> c.execute('create virtual table data using fts3()')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.OperationalError: no such module: fts3
>>> c.execute('select fts3_tokenizer("test")')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.OperationalError: no such function: fts3_tokenizer
ramcharran commented 7 years ago

I am using apsw wrapper for my database connection does that have anything to do with this? These are the output i got

bash-4.3# python3 -c 'import sqlite3 as s; import pprint as pp; r = s.connect(":memory:").execute("pragma compile_options").fetchall(); pp.pprint(r); print(s.sqlite_version)'
[('COMPILER=gcc-6.3.0',),
 ('DEFAULT_SYNCHRONOUS=2',),
 ('DEFAULT_WAL_SYNCHRONOUS=2',),
 ('ENABLE_COLUMN_METADATA',),
 ('ENABLE_DBSTAT_VTAB',),
 ('ENABLE_FTS3',),
 ('ENABLE_FTS3_PARENTHESIS',),
 ('ENABLE_FTS4',),
 ('ENABLE_FTS5',),
 ('ENABLE_JSON1',),
 ('ENABLE_RTREE',),
 ('ENABLE_UNLOCK_NOTIFY',),
 ('SECURE_DELETE',),
 ('SYSTEM_MALLOC',),
 ('THREADSAFE=1',)]
3.18.0
bash-4.3# python3
Python 3.6.1 (default, May  2 2017, 15:16:41) 
[GCC 6.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> c = sqlite3.connect(':memory:')
>>> c.execute('create virtual table data using fts3()')
<sqlite3.Cursor object at 0x7f0056844c70>
>>> c.execute('select fts3_tokenizer("test")')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.OperationalError: unknown tokenizer: test
>>> 

or do u think i need to upgrade gcc?

by the way this the code i am using to register my tokenizer.

connection = apsw.Connection('texts.db', flags=apsw.SQLITE_OPEN_READWRITE)

def tokenize():
        with connection:
            c = connection.cursor()
            print("connection to cursor")
            fts.register_tokenizer(c, 'oulatin', fts.make_tokenizer_module(<calling my custom tokenizer here>))
hideaki-t commented 7 years ago

ah okay, I didn't expect this is used with other than standard sqlite3 module. I tried some I found some...

I used a docker image python:alpine (alpine 3.4 with Python 3.6.1), I hope the same method works on your environment. Preparation

wget http://www.sqlite.org/2017/sqlite-autoconf-3190300.tar.gz
tar zxvf sqlite-autoconf-3190300.tar.gz
cd sqlite-autoconf-3190300/
CPPFLAGS="-DSQLITE_ENABLE_FTS3_TOKENIZER=1" ./configure
make
make install
pip install -U --force-reinstall apsw --global-option=build --global-option=--enable-all-extensions
pip install git+git://github.com/hideaki-t/sqlite-fts-python.git@apsw

test script

import re
import apsw
import sqlitefts as fts

class SimpleTokenizer(fts.Tokenizer):
    _p = re.compile(r'\w+', re.UNICODE)
    def tokenize(self, text):
        for m in self._p.finditer(text):
            s, e = m.span()
            t = text[s:e]
            l = len(t.encode('utf-8'))
            p = len(text[:s].encode('utf-8'))
            yield t, p, p + l

c = apsw.Connection(':memory:')
fts.register_tokenizer(c, 'test', fts.make_tokenizer_module(SimpleTokenizer()))
cur = c.cursor()
print(cur.execute('select fts3_tokenizer("test")').fetchall())
print(cur.execute('create virtual table data using fts3(tokenize="test")').fetchall())
ramcharran commented 7 years ago

Hi @hideaki-t sorry for the late reply. I have tried the solution u proposed. But now my code exits before executing the insert statements.

connection = apsw.Connection('texts.db', flags=apsw.SQLITE_OPEN_READWRITE)

def tokenize():
        with connection:
            c = connection.cursor()
            print("connection to cursor")
            fts.register_tokenizer(connection, 'oulatin', fts.make_tokenizer_module(OUWordTokenizer('latin')))
            # fts.register_tokenizer(c, 'porter')
            print("registering tokenizer")
            c.execute("CREATE VIRTUAL TABLE IF NOT EXISTS text_idx  USING fts3 (id, title, book, author, date, chapter, verse, passage, link, documentType, tokenize={});".format(
                    "oulatin"))
            c.execute("CREATE VIRTUAL TABLE IF NOT EXISTS text_idx_porter  USING fts3 (id, title, book, author, date, chapter, verse, passage, link, documentType, tokenize={});".format(
                    "porter"))
            c.execute("commit")

            print("virtual table created")
            c.execute("INSERT INTO text_idx (id, title, book, author, date, chapter, verse, passage, link, documentType) SELECT id, title, book, author, date, chapter, verse, passage, link, documentType FROM texts;")
            c.execute("INSERT INTO text_idx_porter (id, title, book, author, date, chapter, verse, passage, link, documentType) SELECT id, title, book, author, date, chapter, verse, passage, link, documentType FROM texts;")
            c.execute("commit")

when i execute the above code the program exits right before the insert statement.

this is the output i got when i executed the program on my docker with alpine os, where the code is in the filename 'app.py':

bash-4.3# python3 app.py
Arabic not supported. Install `pyarabic` library to tokenize Arabic.
 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
 * Restarting with stat
Arabic not supported. Install `pyarabic` library to tokenize Arabic.
connection to cursor
registering tokenizer
virtual table created
bash-4.3# 
hideaki-t commented 7 years ago

Hi @ramcharran, I don't know what is OUWordTokenizer, so I used SimpleTokenizer in my previous script. also I remove "commit", because I found "commit" shouldn't be executed within "with connection" block. if the connection is not closed, it should be closed. Exiting script without closing connection can cause SEGV (not sure for apsw, but the standard sqlite3 module does)

import apsw

con = apsw.Connection(':memory:')
with con:
    cur = con.cursor()
    cur.execute('create virtual table text using fts3')
    cur.execute('commit')
(eee) [hideaki@archbox eee]$ python test.py
Traceback (most recent call last):
  File "test.py", line 7, in <module>
    cur.execute('commit')
apsw.SQLError: SQLError: no such savepoint: _apsw-0
SQLite version 3.19.3 2017-06-08 14:26:16
Enter ".help" for usage hints.
sqlite> CREATE TABLE texts (id, title, book, author, date, chapter, verse, passage, link, documentType);
sqlite> INSERT INTO texts VALUES('id','title','book','author','date','chapter','verse','passage','link','documentType');
sqlite>
$ python app.py
connection to cursor
<apsw.Cursor object at 0x7f4355abc990>
registering tokenizer
virtual table created
$ sqlite3 texts.db
SQLite version 3.19.3 2017-06-08 14:26:16
Enter ".help" for usage hints.
sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE texts (id, title, book, author, date, chapter, verse, passage, link, documentType);
INSERT INTO texts VALUES('id','title','book','author','date','chapter','verse','passage','link','documentType');
PRAGMA writable_schema=ON;
INSERT INTO sqlite_master(type,name,tbl_name,rootpage,sql)VALUES('table','text_idx','text_idx',0,'CREATE VIRTUAL TABLE text_idx  USING fts3 (id, title, book, author, date, chapter, verse, passage, link, documentType, tokenize=oulatin)');
CREATE TABLE IF NOT EXISTS 'text_idx_content'(docid INTEGER PRIMARY KEY, 'c0id', 'c1title', 'c2book', 'c3author', 'c4date', 'c5chapter', 'c6verse', 'c7passage', 'c8link', 'c9documentType');
INSERT INTO text_idx_content VALUES(1,'id','title','book','author','date','chapter','verse','passage','link','documentType');
INSERT INTO text_idx_content VALUES(2,'id','title','book','author','date','chapter','verse','passage','link','documentType');
CREATE TABLE IF NOT EXISTS 'text_idx_segments'(blockid INTEGER PRIMARY KEY, block BLOB);
CREATE TABLE IF NOT EXISTS 'text_idx_segdir'(level INTEGER,idx INTEGER,start_block INTEGER,leaves_end_block INTEGER,end_block INTEGER,root BLOB,PRIMARY KEY(level, idx));
INSERT INTO text_idx_segdir VALUES(0,0,0,0,'0 133',X'0006617574686f720501010302000004626f6f6b050101020200000763686170746572050101050200000464617465050101040200010b6f63756d656e7454797065050101090200000269640301020000046c696e6b05010108020000077061737361676505010107020000057469746c6505010101020000057665727365050101060200');
INSERT INTO text_idx_segdir VALUES(0,1,0,0,'0 133',X'0006617574686f720502010302000004626f6f6b050201020200000763686170746572050201050200000464617465050201040200010b6f63756d656e7454797065050201090200000269640302020000046c696e6b05020108020000077061737361676505020107020000057469746c6505020101020000057665727365050201060200');
INSERT INTO sqlite_master(type,name,tbl_name,rootpage,sql)VALUES('table','text_idx_porter','text_idx_porter',0,'CREATE VIRTUAL TABLE text_idx_porter  USING fts3 (id, title, book, author, date, chapter, verse, passage, link, documentType, tokenize=porter)');
CREATE TABLE IF NOT EXISTS 'text_idx_porter_content'(docid INTEGER PRIMARY KEY, 'c0id', 'c1title', 'c2book', 'c3author', 'c4date', 'c5chapter', 'c6verse', 'c7passage', 'c8link', 'c9documentType');
INSERT INTO text_idx_porter_content VALUES(1,'id','title','book','author','date','chapter','verse','passage','link','documentType');
INSERT INTO text_idx_porter_content VALUES(2,'id','title','book','author','date','chapter','verse','passage','link','documentType');
CREATE TABLE IF NOT EXISTS 'text_idx_porter_segments'(blockid INTEGER PRIMARY KEY, block BLOB);
CREATE TABLE IF NOT EXISTS 'text_idx_porter_segdir'(level INTEGER,idx INTEGER,start_block INTEGER,leaves_end_block INTEGER,end_block INTEGER,root BLOB,PRIMARY KEY(level, idx));
INSERT INTO text_idx_porter_segdir VALUES(0,0,0,0,'0 129',X'0006617574686f720501010302000004626f6f6b050101020200000763686170746572050101050200000464617465050101040200010a6f63756d656e74747970050101090200000269640301020000046c696e6b050101080200000670617373616705010107020000047469746c050101010200000476657273050101060200');
INSERT INTO text_idx_porter_segdir VALUES(0,1,0,0,'0 129',X'0006617574686f720502010302000004626f6f6b050201020200000763686170746572050201050200000464617465050201040200010a6f63756d656e74747970050201090200000269640302020000046c696e6b050201080200000670617373616705020107020000047469746c050201010200000476657273050201060200');
PRAGMA writable_schema=OFF;
COMMIT;
sqlite>
ramcharran commented 7 years ago

@hideaki-t I have removed the commit statements and the program still exits before the insert statments. To be clear this is the output i get:

/usr/bin/python3.5 /home/ramcharran/phyllo/search/app.py
 * Running on http://0.0.0.3:5000/ (Press CTRL+C to quit)
 * Restarting with stat
connection to cursor
registering tokenizer
virtual table created

Process finished with exit code 245

In the statement: fts.register_tokenizer(c, 'test', fts.make_tokenizer_module(SimpleTokenizer()))

Is 'c' connection or cursor?

ramcharran commented 7 years ago

I tried to debug the code using GDB, since the exit signal is 11.

and i got this as my output.

(gdb) run app.py
Starting program: /usr/bin/python3 app.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
 * Running on http://0.0.0.3:5000/ (Press CTRL+C to quit)
 * Restarting with stat
connection to cursor
registering tokenizer
virtual table created
[Inferior 1 (process 21328) exited with code 0365]

what do i do?

These are my insert statements btw:

c.execute("INSERT INTO text_idx (id, title, book, author, date, chapter, verse, passage, link, documentType) SELECT id, title, book, author, date, chapter, verse, passage, link, documentType FROM texts;")
            c.execute("INSERT INTO text_idx_porter (id, title, book, author, date, chapter, verse, passage, link, documentType) SELECT id, title, book, author, date, chapter, verse, passage, link, documentType FROM texts;")
ramcharran commented 7 years ago

After a bit of expermenting with numbers I found that the program is exiting because the database has a very huge number of tuples and can only insert 100 tuples at once.

But then i tried inserting 100 tuples at once with the following code:

while l<i:
      c.execute("INSERT INTO text_idx (id, title, book, author, date, chapter, verse, passage, link, documentType) SELECT id, title, book, author, date, chapter, verse, passage, link, documentType FROM texts LIMIT 100 OFFSET "+str(l))
       print(l+100)
       l += 100

The program exits after inserting 300 tuples:

/usr/bin/python3.5 /home/ramcharran/phyllo/search/app.py
 * Running on http://0.0.0.3:5000/ (Press CTRL+C to quit)
 * Restarting with stat
connection to cursor
reg tokenizer
[('texts',)]
list tables
registering tokenizer
[('texts',), ('text_idx',), ('text_idx_content',), ('text_idx_segments',), ('text_idx_segdir',), ('text_idx_porter',), ('text_idx_porter_content',), ('text_idx_porter_segments',), ('text_idx_porter_segdir',)]
virtual table created
100
200
300
*** Error in `/usr/bin/python3.5': double free or corruption (!prev): 0x00000000024527d0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7ff4512147e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7ff45121d37a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7ff45122153c]
/usr/local/lib/python3.5/dist-packages/apsw-3.18.0.post1-py3.5-linux-x86_64.egg/apsw.cpython-35m-x86_64-linux-gnu.so(+0x215f8)[0x7ff44f52b5f8]
/usr/local/lib/python3.5/dist-packages/apsw-3.18.0.post1-py3.5-linux-x86_64.egg/apsw.cpython-35m-x86_64-linux-gnu.so(+0x21e59)[0x7ff44f52be59]
/usr/local/lib/python3.5/dist-packages/apsw-3.18.0.post1-py3.5-linux-x86_64.egg/apsw.cpython-35m-x86_64-linux-gnu.so(+0xcec50)[0x7ff44f5d8c50]
/usr/local/lib/python3.5/dist-packages/apsw-3.18.0.post1-py3.5-linux-x86_64.egg/apsw.cpython-35m-x86_64-linux-gnu.so(+0xced21)[0x7ff44f5d8d21]
/usr/local/lib/python3.5/dist-packages/apsw-3.18.0.post1-py3.5-linux-x86_64.egg/apsw.cpython-35m-x86_64-linux-gnu.so(+0x15d4d)[0x7ff44f51fd4d]
/usr/local/lib/python3.5/dist-packages/apsw-3.18.0.post1-py3.5-linux-x86_64.egg/apsw.cpython-35m-x86_64-linux-gnu.so(+0xb3455)[0x7ff44f5bd455]
/usr/local/lib/python3.5/dist-packages/apsw-3.18.0.post1-py3.5-linux-x86_64.egg/apsw.cpython-35m-x86_64-linux-gnu.so(+0xb832f)[0x7ff44f5c232f]
/usr/local/lib/python3.5/dist-packages/apsw-3.18.0.post1-py3.5-linux-x86_64.egg/apsw.cpython-35m-x86_64-linux-gnu.so(+0xe37db)[0x7ff44f5ed7db]
/usr/local/lib/python3.5/dist-packages/apsw-3.18.0.post1-py3.5-linux-x86_64.egg/apsw.cpython-35m-x86_64-linux-gnu.so(+0xe4444)[0x7ff44f5ee444]
/usr/bin/python3.5(PyCFunction_Call+0x4f)[0x4e9b9f]
/usr/bin/python3.5(PyEval_EvalFrameEx+0x614)[0x524414]
/usr/bin/python3.5(PyEval_EvalFrameEx+0x4a14)[0x528814]
/usr/bin/python3.5[0x52d2e3]
/usr/bin/python3.5(PyEval_EvalCode+0x1f)[0x52dfdf]
/usr/bin/python3.5[0x5fd2c2]
/usr/bin/python3.5(PyRun_FileExFlags+0x9a)[0x5ff76a]
/usr/bin/python3.5(PyRun_SimpleFileExFlags+0x1bc)[0x5ff95c]
/usr/bin/python3.5(Py_Main+0x456)[0x63e7d6]
/usr/bin/python3.5(main+0xe1)[0x4cfe41]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ff4511bd830]
/usr/bin/python3.5(_start+0x29)[0x5d5f29]
======= Memory map: ========
00400000-007a8000 r-xp 00000000 08:06 1836253                            /usr/bin/python3.5
009a8000-009aa000 r--p 003a8000 08:06 1836253                            /usr/bin/python3.5
009aa000-00a41000 rw-p 003aa000 08:06 1836253                            /usr/bin/python3.5
00a41000-00a72000 rw-p 00000000 00:00 0 
017b1000-02619000 rw-p 00000000 00:00 0                                  [heap]
7ff440000000-7ff440021000 rw-p 00000000 00:00 0 
7ff440021000-7ff444000000 ---p 00000000 00:00 0 
7ff44660c000-7ff44af36000 rw-p 00000000 00:00 0 
7ff44b4f1000-7ff44b5b1000 rw-p 00000000 00:00 0 
7ff44b5b1000-7ff44b5b5000 r-xp 00000000 08:06 266858                     /lib/x86_64-linux-gnu/libuuid.so.1.3.0
7ff44b5b5000-7ff44b7b4000 ---p 00004000 08:06 266858                     /lib/x86_64-linux-gnu/libuuid.so.1.3.0
7ff44b7b4000-7ff44b7b5000 r--p 00003000 08:06 266858                     /lib/x86_64-linux-gnu/libuuid.so.1.3.0
7ff44b7b5000-7ff44b7b6000 rw-p 00004000 08:06 266858                     /lib/x86_64-linux-gnu/libuuid.so.1.3.0
7ff44b7b6000-7ff44b836000 rw-p 00000000 00:00 0 
7ff44b836000-7ff44b838000 r-xp 00000000 08:06 5244952                    /home/ramcharran/.local/lib/python3.5/site-packages/markupsafe/_speedups.cpython-35m-x86_64-linux-gnu.so
7ff44b838000-7ff44ba37000 ---p 00002000 08:06 5244952                    /home/ramcharran/.local/lib/python3.5/site-packages/markupsafe/_speedups.cpython-35m-x86_64-linux-gnu.so
7ff44ba37000-7ff44ba38000 r--p 00001000 08:06 5244952                    /home/ramcharran/.local/lib/python3.5/site-packages/markupsafe/_speedups.cpython-35m-x86_64-linux-gnu.so
7ff44ba38000-7ff44ba39000 rw-p 00002000 08:06 5244952                    /home/ramcharran/.local/lib/python3.5/site-packages/markupsafe/_speedups.cpython-35m-x86_64-linux-gnu.so
7ff44ba39000-7ff44c13a000 rw-p 00000000 00:00 0 
7ff44c13a000-7ff44c150000 r-xp 00000000 08:06 266712                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7ff44c150000-7ff44c34f000 ---p 00016000 08:06 266712                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7ff44c34f000-7ff44c350000 rw-p 00015000 08:06 266712                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7ff44c350000-7ff44c4c2000 r-xp 00000000 08:06 1844130                    /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7ff44c4c2000-7ff44c6c2000 ---p 00172000 08:06 1844130                    /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7ff44c6c2000-7ff44c6cc000 r--p 00172000 08:06 1844130                    /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7ff44c6cc000-7ff44c6ce000 rw-p 0017c000 08:06 1844130                    /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7ff44c6ce000-7ff44c6d2000 rw-p 00000000 00:00 0 
7ff44c6d2000-7ff44c72f000 r-xp 00000000 08:06 8653797                    /usr/local/lib/python3.5/dist-packages/pycrfsuite/_pycrfsuite.cpython-35m-x86_64-linux-gnu.so
7ff44c72f000-7ff44c92e000 ---p 0005d000 08:06 8653797                    /usr/local/lib/python3.5/dist-packages/pycrfsuite/_pycrfsuite.cpython-35m-x86_64-linux-gnu.so
7ff44c92e000-7ff44c935000 rw-p 0005c000 08:06 8653797                    /usr/local/lib/python3.5/dist-packages/pycrfsuite/_pycrfsuite.cpython-35m-x86_64-linux-gnu.so
7ff44c935000-7ff44c9b6000 rw-p 00000000 00:00 0 
7ff44c9b6000-7ff44ca1d000 r-xp 00000000 08:06 4329630                    /home/ramcharran/.local/lib/python3.5/site-packages/_regex.cpython-35m-x86_64-linux-gnu.so
7ff44ca1d000-7ff44cc1c000 ---p 00067000 08:06 4329630                    /home/ramcharran/.local/lib/python3.5/site-packages/_regex.cpython-35m-x86_64-linux-gnu.so
7ff44cc1c000-7ff44cc1d000 r--p 00066000 08:06 4329630                    /home/ramcharran/.local/lib/python3.5/site-packages/_regex.cpython-35m-x86_64-linux-gnu.so
7ff44cc1d000-7ff44cc26000 rw-p 00067000 08:06 4329630                    /home/ramcharran/.local/lib/python3.5/site-packages/_regex.cpython-35m-x86_64-linux-gnu.so
7ff44cc26000-7ff44d026000 rw-p 00000000 00:00 0 
7ff44d026000-7ff44d084000 r-xp 00000000 08:06 266840                     /lib/x86_64-linux-gnu/libssl.so.1.0.0
7ff44d084000-7ff44d284000 ---p 0005e000 08:06 266840                     /lib/x86_64-linux-gnu/libssl.so.1.0.0
7ff44d284000-7ff44d288000 r--p 0005e000 08:06 266840                     /lib/x86_64-linux-gnu/libssl.so.1.0.0
7ff44d288000-7ff44d28f000 rw-p 00062000 08:06 266840                     /lib/x86_64-linux-gnu/libssl.so.1.0.0
7ff44d28f000-7ff44d2a6000 r-xp 00000000 08:06 2098620                    /usr/lib/python3.5/lib-dynload/_ssl.cpython-35m-x86_64-linux-gnu.so
7ff44d2a6000-7ff44d4a6000 ---p 00017000 08:06 2098620                    /usr/lib/python3.5/lib-dynload/_ssl.cpython-35m-x86_64-linux-gnu.so
7ff44d4a6000-7ff44d4a7000 r--p 00017000 08:06 2098620                    /usr/lib/python3.5/lib-dynload/_ssl.cpython-35m-x86_64-linux-gnu.so
7ff44d4a7000-7ff44d4ac000 rw-p 00018000 08:06 2098620                    /usr/lib/python3.5/lib-dynload/_ssl.cpython-35m-x86_64-linux-gnu.so
7ff44d4ac000-7ff44d52c000 rw-p 00000000 00:00 0 
7ff44d52c000-7ff44d563000 r-xp 00000000 08:06 1843798                    /usr/lib/x86_64-linux-gnu/libmpdec.so.2.4.2
7ff44d563000-7ff44d762000 ---p 00037000 08:06 1843798                    /usr/lib/x86_64-linux-gnu/libmpdec.so.2.4.2
7ff44d762000-7ff44d763000 r--p 00036000 08:06 1843798                    /usr/lib/x86_64-linux-gnu/libmpdec.so.2.4.2
7ff44d763000-7ff44d764000 rw-p 00037000 08:06 1843798                    /usr/lib/x86_64-linux-gnu/libmpdec.so.2.4.2
7ff44d764000-7ff44d788000 r-xp 00000000 08:06 2098610                    /usr/lib/python3.5/lib-dynload/_decimal.cpython-35m-x86_64-linux-gnu.so
7ff44d788000-7ff44d987000 ---p 00024000 08:06 2098610                    /usr/lib/python3.5/lib-dynload/_decimal.cpython-35m-x86_64-linux-gnu.so
7ff44d987000-7ff44d988000 r--p 00023000 08:06 2098610                    /usr/lib/python3.5/lib-dynload/_decimal.cpython-35m-x86_64-linux-gnu.so
7ff44d988000-7ff44d991000 rw-p 00024000 08:06 2098610                    /usr/lib/python3.5/lib-dynload/_decimal.cpython-35m-x86_64-linux-gnu.so
7ff44d991000-7ff44d9b2000 r-xp 00000000 08:06 266741                     /lib/x86_64-linux-gnu/liblzma.so.5.0.0
7ff44d9b2000-7ff44dbb1000 ---p 00021000 08:06 266741                     /lib/x86_64-linux-gnu/liblzma.so.5.0.0
7ff44dbb1000-7ff44dbb2000 r--p 00020000 08:06 266741                     /lib/x86_64-linux-gnu/liblzma.so.5.0.0
7ff44dbb2000-7ff44dbb3000 rw-p 00021000 08:06 266741                     /lib/x86_64-linux-gnu/liblzma.so.5.0.0
7ff44dbb3000-7ff44dbba000 r-xp 00000000 08:06 2098615                    /usr/lib/python3.5/lib-dynload/_lzma.cpython-35m-x86_64-linux-gnu.so
7ff44dbba000-7ff44ddb9000 ---p 00007000 08:06 2098615                    /usr/lib/python3.5/lib-dynload/_lzma.cpython-35m-x86_64-linux-gnu.so
7ff44ddb9000-7ff44ddba000 r--p 00006000 08:06 2098615                    /usr/lib/python3.5/lib-dynload/_lzma.cpython-35m-x86_64-linux-gnu.so
7ff44ddba000-7ff44ddbc000 rw-p 00007000 08:06 2098615                    /usr/lib/python3.5/lib-dynload/_lzma.cpython-35m-x86_64-linux-gnu.so
7ff44ddbc000-7ff44ddcb000 r-xp 00000000 08:06 266673                     /lib/x86_64-linux-gnu/libbz2.so.1.0.4
7ff44ddcb000-7ff44dfca000 ---p 0000f000 08:06 266673                     /lib/x86_64-linux-gnu/libbz2.so.1.0.4
7ff44dfca000-7ff44dfcb000 r--p 0000e000 08:06 266673                     /lib/x86_64-linux-gnu/libbz2.so.1.0.4
7ff44dfcb000-7ff44dfcc000 rw-p 0000f000 08:06 266673                     /lib/x86_64-linux-gnu/libbz2.so.1.0.4
7ff44dfcc000-7ff44dfd0000 r-xp 00000000 08:06 2098596                    /usr/lib/python3.5/lib-dynload/_bz2.cpython-35m-x86_64-linux-gnu.so
7ff44dfd0000-7ff44e1cf000 ---p 00004000 08:06 2098596                    /usr/lib/python3.5/lib-dynload/_bz2.cpython-35m-x86_64-linux-gnu.so
7ff44e1cf000-7ff44e1d0000 r--p 00003000 08:06 2098596                    /usr/lib/python3.5/lib-dynload/_bz2.cpython-35m-x86_64-linux-gnu.so
7ff44e1d0000-7ff44e1d1000 rw-p 00004000 08:06 2098596                    /usr/lib/python3.5/lib-dynload/_bz2.cpython-35m-x86_64-linux-gnu.so
7ff44e1d1000-7ff44e1f3000 r-xp 00000000 08:06 2098605                    /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so
7ff44e1f3000-7ff44e3f2000 ---p 00022000 08:06 2098605                    /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so
7ff44e3f2000-7ff44e3f3000 r--p 00021000 08:06 2098605                    /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so
7ff44e3f3000-7ff44e3f7000 rw-p 00022000 08:06 2098605                    /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so
7ff44e3f7000-7ff44e3f8000 rw-p 00000000 00:00 0 
7ff44e3f8000-7ff44e612000 r-xp 00000000 08:06 266686                     /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7ff44e612000-7ff44e811000 ---p 0021a000 08:06 266686                     /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7ff44e811000-7ff44e82d000 r--p 00219000 08:06 266686                     /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7ff44e82d000-7ff44e839000 rw-p 00235000 08:06 266686                     /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7ff44e839000-7ff44e83c000 rw-p 00000000 00:00 0 
7ff44e83c000-7ff44e841000 r-xp 00000000 08:06 2098612                    /usr/lib/python3.5/lib-dynload/_hashlib.cpython-35m-x86_64-linux-gnu.so
7ff44e841000-7ff44ea41000 ---p 00005000 08:06 2098612                    /usr/lib/python3.5/lib-dynload/_hashlib.cpython-35m-x86_64-linux-gnu.so
7ff44ea41000-7ff44ea42000 r--p 00005000 08:06 2098612                    /usr/lib/python3.5/lib-dynload/_hashlib.cpython-35m-x86_64-linux-gnu.so
7ff44ea42000-7ff44ea43000 rw-p 00006000 08:06 2098612                    /usr/lib/python3.5/lib-dynload/_hashlib.cpython-35m-x86_64-linux-gnu.so
7ff44ea43000-7ff44ea4a000 r-xp 00000000 08:06 4851683                    /home/ramcharran/.local/lib/python3.5/site-packages/.libs_cffi_backend/libffi-72499c49.so.6.0.4
7ff44ea4a000-7ff44ec4a000 ---p 00007000 08:06 4851683                    /home/ramcharran/.local/lib/python3.5/site-packages/.libs_cffi_backend/libffi-72499c49.so.6.0.4
7ff44ec4a000-7ff44ec4b000 rw-p 00007000 08:06 4851683                    /home/ramcharran/.local/lib/python3.5/site-packages/.libs_cffi_backend/libffi-72499c49.so.6.0.4
7ff44ec4b000-7ff44ec4c000 rw-p 00024000 08:06 4851683                    /home/ramcharran/.local/lib/python3.5/site-packages/.libs_cffi_backend/libffi-72499c49.so.6.0.4
7ff44ec4c000-7ff44ec74000 r-xp 00000000 08:06 4329578                    /home/ramcharran/.local/lib/python3.5/site-packages/_cffi_backend.cpython-35m-x86_64-linux-gnu.so
7ff44ec74000-7ff44ee74000 ---p 00028000 08:06 4329578                    /home/ramcharran/.local/lib/python3.5/site-packages/_cffi_backend.cpython-35m-x86_64-linux-gnu.so
7ff44ee74000-7ff44ee7a000 rw-p 00028000 08:06 4329578                    /home/ramcharran/.local/lib/python3.5/site-packages/_cffi_backend.cpython-35m-x86_64-linux-gnu.so
7ff44ee7a000-7ff44ee7c000 rw-p 00000000 00:00 0 
7ff44ee7c000-7ff44ee80000 rw-p 000bf000 08:06 4329578                    /home/ramcharran/.local/lib/python3.5/site-packages/_cffi_backend.cpython-35m-x86_64-linux-gnu.so
7ff44ee80000-7ff44ee87000 r-xp 00000000 08:06 2098604                    /usr/lib/python3.5/lib-dynload/_csv.cpython-35m-x86_64-linux-gnu.so
7ff44ee87000-7ff44f087000 ---p 00007000 08:06 2098604                    /usr/lib/python3.5/lib-dynload/_csv.cpython-35m-x86_64-linux-gnu.so
7ff44f087000-7ff44f088000 r--p 00007000 08:06 2098604                    /usr/lib/python3.5/lib-dynload/_csv.cpython-35m-x86_64-linux-gnu.so
7ff44f088000-7ff44f08a000 rw-p 00008000 08:06 2098604                    /usr/lib/python3.5/lib-dynload/_csv.cpython-35m-x86_64-linux-gnu.so
7ff44f08a000-7ff44f0ca000 rw-p 00000000 00:00 0 
7ff44f109000-7ff44f1c9000 rw-p 00000000 00:00 0 
7ff44f1ca000-7ff44f3ca000 rw-p 00000000 00:00 0 
7ff44f3ca000-7ff44f50a000 rw-p 00000000 00:00 0 
7ff44f50a000-7ff44f649000 r-xp 00000000 08:06 2370937                    /usr/local/lib/python3.5/dist-packages/apsw-3.18.0.post1-py3.5-linux-x86_64.egg/apsw.cpython-35m-x86_64-linux-gnu.so
7ff44f649000-7ff44f849000 ---p 0013f000 08:06 2370937                    /usr/local/lib/python3.5/dist-packages/apsw-3.18.0.post1-py3.5-linux-x86_64.egg/apsw.cpython-35m-x86_64-linux-gnu.so
7ff44f849000-7ff44f84d000 r--p 0013f000 08:06 2370937                    /usr/local/lib/python3.5/dist-packages/apsw-3.18.0.post1-py3.5-linux-x86_64.egg/apsw.cpython-35m-x86_64-linux-gnu.so
7ff44f84d000-7ff44f852000 rw-p 00143000 08:06 2370937                    /usr/local/lib/python3.5/dist-packages/apsw-3.18.0.post1-py3.5-linux-x86_64.egg/apsw.cpython-35m-x86_64-linux-gnu.so
7ff44f852000-7ff44f913000 rw-p 00000000 00:00 0 
7ff44f913000-7ff44f914000 r-xp 00000000 08:06 2098618                    /usr/lib/python3.5/lib-dynload/_opcode.cpython-35m-x86_64-linux-gnu.so
7ff44f914000-7ff44fb13000 ---p 00001000 08:06 2098618                    /usr/lib/python3.5/lib-dynload/_opcode.cpython-35m-x86_64-linux-gnu.so
7ff44fb13000-7ff44fb14000 r--p 00000000 08:06 2098618                    /usr/lib/python3.5/lib-dynload/_opcode.cpython-35m-x86_64-linux-gnu.so
7ff44fb14000-7ff44fb15000 rw-p 00001000 08:06 2098618                    /usr/lib/python3.5/lib-dynload/_opcode.cpython-35m-x86_64-linux-gnu.so
7ff44fb36000-7ff44fbb6000 rw-p 00000000 00:00 0 
7ff44fbb6000-7ff44fc85000 r-xp 00000000 08:06 1844120                    /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6
7ff44fc85000-7ff44fe85000 ---p 000cf000 08:06 1844120                    /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6
7ff44fe85000-7ff44fe88000 r--p 000cf000 08:06 1844120                    /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6
7ff44fe88000-7ff44fe8a000 rw-p 000d2000 08:06 1844120                    /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6
7ff44fe8a000-7ff44fe8b000 rw-p 00000000 00:00 0 
7ff44fe8b000-7ff44fe9d000 r-xp 00000000 08:06 2098619                    /usr/lib/python3.5/lib-dynload/_sqlite3.cpython-35m-x86_64-linux-gnu.so
7ff44fe9d000-7ff45009c000 ---p 00012000 08:06 2098619                    /usr/lib/python3.5/lib-dynload/_sqlite3.cpython-35m-x86_64-linux-gnu.so
7ff45009c000-7ff45009d000 r--p 00011000 08:06 2098619                    /usr/lib/python3.5/lib-dynload/_sqlite3.cpython-35m-x86_64-linux-gnu.so
7ff45009d000-7ff4500a0000 rw-p 00012000 08:06 2098619                    /usr/lib/python3.5/lib-dynload/_sqlite3.cpython-35m-x86_64-linux-gnu.so
7ff4500a0000-7ff450120000 rw-p 00000000 00:00 0 
7ff450120000-7ff450131000 r-xp 00000000 08:06 2098613                    /usr/lib/python3.5/lib-dynload/_json.cpython-35m-x86_64-linux-gnu.so
7ff450131000-7ff450330000 ---p 00011000 08:06 2098613                    /usr/lib/python3.5/lib-dynload/_json.cpython-35m-x86_64-linux-gnu.so
7ff450330000-7ff450331000 r--p 00010000 08:06 2098613                    /usr/lib/python3.5/lib-dynload/_json.cpython-35m-x86_64-linux-gnu.so
7ff450331000-7ff450332000 rw-p 00011000 08:06 2098613                    /usr/lib/python3.5/lib-dynload/_json.cpython-35m-x86_64-linux-gnu.so
7ff450332000-7ff450372000 rw-p 00000000 00:00 0 
7ff450372000-7ff45064a000 r--p 00000000 08:06 1852687                    /usr/lib/locale/locale-archive
7ff45064a000-7ff450752000 r-xp 00000000 08:06 267909                     /lib/x86_64-linux-gnu/libm-2.23.so
7ff450752000-7ff450951000 ---p 00108000 08:06 267909                     /lib/x86_64-linux-gnu/libm-2.23.so
7ff450951000-7ff450952000 r--p 00107000 08:06 267909                     /lib/x86_64-linux-gnu/libm-2.23.so
7ff450952000-7ff450953000 rw-p 00108000 08:06 267909                     /lib/x86_64-linux-gnu/libm-2.23.so
7ff450953000-7ff45096c000 r-xp 00000000 08:06 270133                     /lib/x86_64-linux-gnu/libz.so.1.2.8
7ff45096c000-7ff450b6b000 ---p 00019000 08:06 270133                     /lib/x86_64-linux-gnu/libz.so.1.2.8
7ff450b6b000-7ff450b6c000 r--p 00018000 08:06 270133                     /lib/x86_64-linux-gnu/libz.so.1.2.8
7ff450b6c000-7ff450b6d000 rw-p 00019000 08:06 270133                     /lib/x86_64-linux-gnu/libz.so.1.2.8
7ff450b6d000-7ff450b93000 r-xp 00000000 08:06 266705                     /lib/x86_64-linux-gnu/libexpat.so.1.6.0
7ff450b93000-7ff450d93000 ---p 00026000 08:06 266705                     /lib/x86_64-linux-gnu/libexpat.so.1.6.0
7ff450d93000-7ff450d95000 r--p 00026000 08:06 266705                     /lib/x86_64-linux-gnu/libexpat.so.1.6.0
7ff450d95000-7ff450d96000 rw-p 00028000 08:06 266705                     /lib/x86_64-linux-gnu/libexpat.so.1.6.0
7ff450d96000-7ff450d98000 r-xp 00000000 08:06 267907                     /lib/x86_64-linux-gnu/libutil-2.23.so
7ff450d98000-7ff450f97000 ---p 00002000 08:06 267907                     /lib/x86_64-linux-gnu/libutil-2.23.so
7ff450f97000-7ff450f98000 r--p 00001000 08:06 267907                     /lib/x86_64-linux-gnu/libutil-2.23.so
7ff450f98000-7ff450f99000 rw-p 00002000 08:06 267907                     /lib/x86_64-linux-gnu/libutil-2.23.so
7ff450f99000-7ff450f9c000 r-xp 00000000 08:06 267785                     /lib/x86_64-linux-gnu/libdl-2.23.so
7ff450f9c000-7ff45119b000 ---p 00003000 08:06 267785                     /lib/x86_64-linux-gnu/libdl-2.23.so
7ff45119b000-7ff45119c000 r--p 00002000 08:06 267785                     /lib/x86_64-linux-gnu/libdl-2.23.so
7ff45119c000-7ff45119d000 rw-p 00003000 08:06 267785                     /lib/x86_64-linux-gnu/libdl-2.23.so
7ff45119d000-7ff45135d000 r-xp 00000000 08:06 267914                     /lib/x86_64-linux-gnu/libc-2.23.so
7ff45135d000-7ff45155d000 ---p 001c0000 08:06 267914                     /lib/x86_64-linux-gnu/libc-2.23.so
7ff45155d000-7ff451561000 r--p 001c0000 08:06 267914                     /lib/x86_64-linux-gnu/libc-2.23.so
7ff451561000-7ff451563000 rw-p 001c4000 08:06 267914                     /lib/x86_64-linux-gnu/libc-2.23.so
7ff451563000-7ff451567000 rw-p 00000000 00:00 0 
7ff451567000-7ff45157f000 r-xp 00000000 08:06 266856                     /lib/x86_64-linux-gnu/libpthread-2.23.so
7ff45157f000-7ff45177e000 ---p 00018000 08:06 266856                     /lib/x86_64-linux-gnu/libpthread-2.23.so
7ff45177e000-7ff45177f000 r--p 00017000 08:06 266856                     /lib/x86_64-linux-gnu/libpthread-2.23.so
7ff45177f000-7ff451780000 rw-p 00018000 08:06 266856                     /lib/x86_64-linux-gnu/libpthread-2.23.so
7ff451780000-7ff451784000 rw-p 00000000 00:00 0 
7ff451784000-7ff4517aa000 r-xp 00000000 08:06 262230                     /lib/x86_64-linux-gnu/ld-2.23.so
7ff4517c7000-7ff45198c000 rw-p 00000000 00:00 0 
7ff45199d000-7ff45199e000 rw-p 00000000 00:00 0 
7ff45199e000-7ff4519a0000 rwxp 00000000 00:00 0 
7ff4519a0000-7ff4519a7000 r--s 00000000 08:06 2107583                    /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache
7ff4519a7000-7ff4519a9000 rw-p 00000000 00:00 0 
7ff4519a9000-7ff4519aa000 r--p 00025000 08:06 262230                     /lib/x86_64-linux-gnu/ld-2.23.so
7ff4519aa000-7ff4519ab000 rw-p 00026000 08:06 262230                     /lib/x86_64-linux-gnu/ld-2.23.so
7ff4519ab000-7ff4519ac000 rw-p 00000000 00:00 0 
7ffdf03b6000-7ffdf03d7000 rw-p 00000000 00:00 0                          [stack]
7ffdf03df000-7ffdf03e1000 r--p 00000000 00:00 0                          [vvar]
7ffdf03e1000-7ffdf03e3000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Process finished with exit code 250

Can u point me to what I am not able to see?

hideaki-t commented 7 years ago

Thanks, I will investigate the SEGV issue. I could reproduce the issue and, it looks like the issue is in this module. for example, porter tokenizer is okay.

ramcharran commented 7 years ago

To be clear OUWordTokenizer is my user defined womanizer. Are u saying the issue is with my tokenizer or sqlite-fts module itself?

hideaki-t commented 7 years ago

no. I reproduced it with SimpleTokenizer.

ramcharran commented 7 years ago

Oh thanks! Let me know if I can help any further.

hideaki-t commented 7 years ago

Thanks @ramcharran,

I found a memory leak that can lead the crash, and I fixed it. can you try the latest apsw branch?

ramcharran commented 7 years ago

I resinstalled the whole package and i tried running the program and i got this:

root@ramcharran-GL502VT:/home/ramcharran/phyllo/search# python3 app.py
 * Running on http://0.0.0.3:5000/ (Press CTRL+C to quit)
 * Restarting with stat
connection to cursor
Traceback (most recent call last):
  File "app.py", line 74, in <module>
    tokenize()
  File "app.py", line 22, in tokenize
    fts.register_tokenizer(c, 'oulatin', fts.make_tokenizer_module(OUWordTokenizer('latin')))
  File "/usr/local/lib/python3.5/dist-packages/sqlitefts/tokenizer.py", line 193, in register_tokenizer
    raise Error('cannot enable FTS3 tokenizer')
sqlitefts.error.Error: cannot enable FTS3 tokenizer
hideaki-t commented 7 years ago

hmm. I know it is due to #7, and also it can be caused by mixing SQLite versions in libsqlite.so and apsw.

how did you install apsw and SQLite? I noticed that libsqlite and apsw should be the same SQLite version. because apsw's shared object cannot be used via FFI, so we need to use libsqlite.so. I also noticed that installing apsw via pypi is not good idea.

I installed SQLite 3.19.3 / apsw 3.19.3-r1 SQLite

$ CPPFLAGS="-DSQLITE_ENABLE_FTS3_TOKENIZER=1" ./configure
$ make install

apsw

$ python setup.py fetch --all build --enable-all-extensions install
ramcharran commented 7 years ago

Hi @hideaki-t installing the same version, can u tell me the commands. I am not sure what i am doing wrong here.

The following is my docker build file:


FROM alpine:latest

MAINTAINER Christan Grant <cgrant@ou.edu>

# Usage docker build -t cegme/oulatin-search .
# docker run -dt -p 5000:5000 cegme/oulatin-search

RUN apk update
RUN apk add git curl vim strace tmux htop tar make
RUN apk add python3-dev tcl-dev gcc g++ libffi-dev
RUN apk add bash

RUN pip3 install --upgrade pip &&\
        pip3 install apsw nltk cltk flask beautifulsoup4 ipython html5lib flask-wtf flask-bootstrap

RUN pip3 install sqlitefts

RUN mkdir /src
RUN mkdir /src/templates

COPY ./search/buildcode.sh /src
COPY ./search/app.py /src
COPY ./search/search1.py /src
COPY ./search/query.py /src
COPY ./search/search.html /src/templates
COPY ./search/search_results.html /src/templates
COPY ./search/interface.py /src

RUN cd /src && bash buildcode.sh

ADD . /phyllo
ADD . /search1
RUN cd /phyllo && pip3 install .

# Download the database file to /src
RUN cd /src && python3 -c "import phyllo.data_extractor as d; d.main()"

#RUN cd /src && python3 -c "import app as f; f.tokenize()"

EXPOSE 5000
WORKDIR /src
#ENTRYPOINT ["python3"]
#CMD ["/src/app.py"]

The buildcode.sh that is run in the docker file is this:


#!/bin/bash                                                                                                                                                                                                 
#export JQLITE="$HOME/bin/jqlite"                                                                                                                                                                           
export JQLITE="/usr/local/bin/jqlite"
mkdir -p $JQLITE
cd $JQLITE
curl -o sqlite.tar.gz https://www.sqlite.org/src/tarball/sqlite.tar.gz
tar xvzf sqlite.tar.gz
mkdir bld
cd bld

export CFLAGS="-DSQLITE_ENABLE_COLUMN_METADATA \                                                                                                                                                            
-DSQLITE_ENABLE_DBSTAT_VTAB \                                                                                                                                                                               
-DSQLITE_ENABLE_FTS3 \                                                                                                                                                                                      
-DSQLITE_ENABLE_FTS4 \                                                                                                                                                                                      
-DSQLITE_ENABLE_FTS5 \                                                                                                                                                                                      
-DSQLITE_ENABLE_JSON1 \                                                                                                                                                                                     
-DSQLITE_ENABLE_STAT4 \                                                                                                                                                                                     
-DSQLITE_ENABLE_UPDATE_DELETE_LIMIT \                                                                                                                                                                       
-DSQLITE_SECURE_DELETE \                                                                                                                                                                                    
-DSQLITE_SOUNDEX \                                                                                                                                                                                          
-DSQLITE_ENABLE_FTS3_TOKENIZER \                                                                                                                                                                            
-DSQLITE_TEMP_STORE=3 \                                                                                                                                                                                     
-DSQLITE_ENABLE_FTS3_PARENTHESIS \                                                                                                                                                                          
-O2 \                                                                                                                                                                                                       
-fPIC"
LIBS="-lm" ../sqlite/configure --prefix=$JQLITE --enable-static --enable-shared
make
make install

cd $JQLITE
git clone https://github.com/rogerbinns/apsw
cd apsw
cp $JQLITE/bld/sqlite3ext.h .
cp $JQLITE/bld/sqlite3.h .
cp $JQLITE/bld/sqlite3.c .
echo -e "library_dirs=$JQLITE/lib" >> setup.cfg
echo -e "include_dirs=$JQLITE/include" >> setup.cfg
LIBS="-lm" python3 setup.py build --enable-all-extensions

cd $JQLITE/apsw
python3 setup.py install
hideaki-t commented 7 years ago

I also tried to run a test on alpine:latest, and it seems okay.

[hideaki@archbox d]$ cat Dockerfile
FROM alpine:latest

RUN apk update && apk add build-base git libffi-dev python3-dev wget
RUN cd && wget http://www.sqlite.org/2017/sqlite-autoconf-3190300.tar.gz https://github.com/rogerbinns/apsw/releases/download/3.19.3-r1/apsw-3.19.3-r1.zip
RUN cd && tar zxvf sqlite-autoconf-3190300.tar.gz && cd sqlite-autoconf-3190300/ && CPPFLAGS="-DSQLITE_ENABLE_FTS3_TOKENIZER=1" ./configure && make install
RUN cd && unzip apsw-3.19.3-r1.zip && cd apsw-3.19.3-r1 && python3 setup.py build --enable-all-extensions install
RUN pip3 install git+git://github.com/hideaki-t/sqlite-fts-python.git@apsw
ADD app.py texts.db /

[hideaki@archbox d]$ docker build -t test .
Sending build context to Docker daemon  1.711MB
Step 1/7 : FROM alpine:latest
 ---> 7328f6f8b418
Step 2/7 : RUN apk update && apk add build-base git libffi-dev python3-dev wget
 ---> Using cache
 ---> 8501583552ad
Step 3/7 : RUN cd && wget http://www.sqlite.org/2017/sqlite-autoconf-3190300.tar.gz https://github.com/rogerbinns/apsw/releases/download/3.19.3-r1/apsw-3.19.3-r1.zip
 ---> Using cache
 ---> 5c36c3de0a32
Step 4/7 : RUN cd && tar zxvf sqlite-autoconf-3190300.tar.gz && cd sqlite-autoconf-3190300/ && CPPFLAGS="-DSQLITE_ENABLE_FTS3_TOKENIZER=1" ./configure && make install
 ---> Using cache
 ---> f9c848644320
Step 5/7 : RUN cd && unzip apsw-3.19.3-r1.zip && cd apsw-3.19.3-r1 && python3 setup.py build --enable-all-extensions install
 ---> Using cache
 ---> 5383b31671d6
Step 6/7 : RUN pip3 install git+git://github.com/hideaki-t/sqlite-fts-python.git@apsw
 ---> Using cache
 ---> 5ee8d6367b14
Step 7/7 : ADD app.py texts.db /
 ---> Using cache
 ---> 7f9f30611420
Successfully built 7f9f30611420
Successfully tagged test:latest
[hideaki@archbox d]$ docker run -it test sh
/ # sqlite3 texts.db 'select count(*) from texts'
501
/ # cat app.py
import apsw
import sqlite3
import sqlitefts as fts
import re
import gc

gc.set_debug(gc.DEBUG_LEAK)

connection = apsw.Connection('texts.db', flags=apsw.SQLITE_OPEN_READWRITE)
#connection = sqlite3.Connection('texts.db')

class SimpleTokenizer(fts.Tokenizer):
    _p = re.compile(r'\w+', re.UNICODE)
    def tokenize(self, text):
        for m in self._p.finditer(text):
            s, e = m.span()
            t = text[s:e]
            l = len(t.encode('utf-8'))
            p = len(text[:s].encode('utf-8'))
            yield t, p, p + l

def tokenize():
        with connection:
            c = connection.cursor()
            print("connection to cursor")
            fts.register_tokenizer(connection, 'oulatin', fts.make_tokenizer_module(SimpleTokenizer()))
            print("registering tokenizer")
            c.execute("CREATE VIRTUAL TABLE IF NOT EXISTS text_idx  USING fts3 (id, title, book, author, date, chapter, verse, passage, link, documentType, tokenize={});".format(
                    "oulatin"))
            c.execute("CREATE VIRTUAL TABLE IF NOT EXISTS text_idx_porter  USING fts3 (id, title, book, author, date, chapter, verse, passage, link, documentType, tokenize={});".format(
                    "porter"))

            print("virtual table created")
            c.execute("INSERT INTO text_idx (id, title, book, author, date, chapter, verse, passage, link, documentType) SELECT id, title, book, author, date, chapter, verse, passage, link, documentType FROM texts;")
            c.execute("INSERT INTO text_idx_porter (id, title, book, author, date, chapter, verse, passage, link, documentType) SELECT id, title, book, author, date, chapter, verse, passage, link, documentType FROM texts;")
            for i in range(20000):
                c.execute("INSERT INTO text_idx (id, title, book, author, date, chapter, verse, passage, link, documentType) values ('id', 'title', 'book', 'author', 'date', 'chapter', 'verse', 'passage', 'link', 'documentType')")

tokenize()
tokenize()
connection.close()

/ # python3 app.py
connection to cursor
registering tokenizer
virtual table created
connection to cursor
registering tokenizer
virtual table created
(omit GC info)
hideaki-t commented 7 years ago

I have published 0.4.9.1 on pypi. although I tested it on several environments, it may still have something affecting your script.

ramcharran commented 7 years ago

@hideaki-t everything is built perfectly but i keep getting this error before i could check for SIGSEGV again.

/usr/bin/python3.5 /home/ramcharran/phyllo/search/app.py
 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
 * Restarting with stat
connection to cursor
Traceback (most recent call last):
  File "/home/ramcharran/phyllo/search/app.py", line 51, in <module>
    tokenize()
  File "/home/ramcharran/phyllo/search/app.py", line 20, in tokenize
    fts.register_tokenizer(c, 'oulatin', fts.make_tokenizer_module(OUWordTokenizer('latin')))
  File "/usr/local/lib/python3.5/dist-packages/sqlitefts/tokenizer.py", line 197, in register_tokenizer
    cur = c.cursor()
AttributeError: 'apsw.Cursor' object has no attribute 'cursor' 

The error occurs again with line: fts.register_tokenizer(c, 'oulatin', fts.make_tokenizer_module(OUWordTokenizer('latin')))

hideaki-t commented 7 years ago

Ah, it is a breaking backward compatibility change. please pass a connection instead of a cursor. the method makes a connection level change not cursor.

since it was designed for Python's standard sqlite3 module, and both of its connection and cursor have execute method. so the register_tokenizer method could call execute whichever connection or cursor. in your case, you passed an apsw's cursor to the method, then it called given_cursor.execute. this is how it worked.

now this module also works with APSW, but its connection does not have "execute". I prefer it to have "common" interface for both sqlite3 and apsw, so I decided to make the change.

this change also makes sense to me, because what the method does are

affect to a connection not cursor.

and I know the SEGV, but I don't know why. I guess it is due to GC or setting flag code causes it, but not sure, I didn't spend time on it.

ramcharran commented 7 years ago

Awesome! It works. Thank you, so much. @hideaki-t

hideaki-t commented 7 years ago

Thank you @ramcharran for the report! I could fix a memory leak and make some improvements :)