heywhy / ex_elasticlunr

Elasticlunr is a small, full-text search library for use in the Elixir environment. It indexes JSON documents and provides a friendly search interface to retrieve documents.
https://hexdocs.pm/elasticlunr
MIT License
189 stars 10 forks source link

Failure deserealizing a saved index? #21

Open nikokozak opened 2 years ago

nikokozak commented 2 years ago

After adding fields, saving, and updating a persisted index, on re-starting via iex -S mix, application crashes with the following error message:

** (Mix) Could not start application elasticlunr: exited in: Elasticlunr.Application.start(:normal, [])
    ** (EXIT) an exception was raised:
        ** (FunctionClauseError) no function clause matching in String.split/3
            (elixir 1.13.4) String.split(["text|{\"nil\":\"~\\n\\n\\n\\noe Do\\n\\n\\n\\n3.3(b)(1)\\n\\n\\n\\nse .. 3.5(c) .\\n\\nete) 3,3(b)(1) .\\n\\nCAICh = ISL (S91 -\\n\\nBasilio Arturo Ignacio LAM zO ARGENTINA +\\n\\n(Pnonetic: LAHmee . . 43\\n\\n\\n\\nApproved for Release: 2018/10/02 C06628363\\n\\n\\n\\nDOHSO) 4\\n\\n\\n\\nCommander in Chief\\n\\n\\n\\nForce; Member, Ruling\\n\\nJunta (since 17 December OFFICE OF\\n\\n1981) CENTRAL REFERENCE\\n\\n\\n\\nAddressed as:\\n\\nGeneral Lami Dozo\\n\\n\\n\\nMaj. Gen. Basilio\\n\\nLami Dozo was secretary\\n\\ngeneral of the Air Force\\n\\nfor over three years and\\n\\nchief of air operations\\n\\nfor one year before as-\\n\\nsuming his present posts.\\n\\nA politician known for\\n\\nhis ability, intelligence, and frankness, he is ex-\\n\\npected to become an,important, spokesman for the rul-\\n\\ning junta, while displaying 4. flexible yet pragmatic\\n\\norientation within", "the group A highly political\\n\\ngeneral, he is comEortable with the give and take of\\n\\npolitics, and he has an impressive network of civil-\\n\\nian contacts. Using his effective, low-key approach, 3.5(C)\\n\\n\\n\\nLami Dozo will probably push for accommodatio\\n\\nthe various political forces in Argentina.\\n\\n\\n\\nLami Dozo is anti-Communist, anti-Peronist, and\\n\\nhighly nationalistic. As an influential member of\\n\\nthe government hierarchy, for the past several years\\n\\nhe has played an active role in negotiations between\\n\\nChile and Argentina over the sovereignty of the\\n\\nBeagle Channel. He has been open and friendly with\\n\\nUS officials in Argentina\\n\\n\\n\\nHe has traveled\\n\\nto this country several times and speaks with fond-\\n\\nness of these trips. ft 3.5(c)\\n\\n\\n\\nA 1950 graduate of the Military Aviation\\n\\nSchool, Lami Dozo subsequently served for 14 years\\n\\nat the Palomar Air Force Base,.in Buenos Aires. In\\n\\n1966 he trained at McGuire Air Force Base on C-130\\n\\naircraft. During 1972-73 he Was stationed in Canada\\n\\nas a delegate to the International Civil Aviation\\n\\nOrganization. Lami Dozo, 52; speaks English and\\n\\nFrench. Married, he has two sons and three daugh-\\n\\n\\n\\nters.\\n\\n3.5(c)\\n\\n\\n\\nCR M 81-15983\\n\\n\\n\\noO \\\"5446030 a |\\n\\n\\n\\nApproved for Release: 2018/10/02 C06628363\\n\\n\\n\"}"], "|", [])
            (elasticlunr 0.6.6) lib/elasticlunr/deserializer.ex:67: Elasticlunr.Deserializer.Parser.parse/3
            (elasticlunr 0.6.6) lib/elasticlunr/deserializer.ex:16: anonymous fn/2 in Elasticlunr.Deserializer.Parser.process/1
            (elixir 1.13.4) lib/enum.ex:4144: anonymous fn/3 in Enum.reduce/3
            (elixir 1.13.4) lib/stream.ex:1559: Stream.do_element_resource/6
            (elixir 1.13.4) lib/enum.ex:4144: Enum.reduce/3
            (elixir 1.13.4) lib/stream.ex:572: anonymous fn/4 in Stream.map/2
            (elixir 1.13.4) lib/enum.ex:4475: Enumerable.List.reduce/3

If necessary, I can upload the source that is doing the data ingestion into the index.

heywhy commented 2 years ago

I will like to have a look at the source, please to better help me understand the issue.

nikokozak commented 2 years ago

Thank you! I believe this is the code I was using:

https://gist.github.com/nikokozak/555e7815b8df09d3abe79def0672cd7a

I create an index, add documents to it, and then quit out of iex -S mix. Then, on invoking iex -S mix again, the error appears.

Also as a sidenote, adding in of these docs takes a really long time - I'm wondering if it's better to truncate the text strings.