mikemccand / stargazers-migration-test

Testing Lucene's Jira -> GitHub issues migration
0 stars 0 forks source link

Index-time join ToParentBlockJoinQuery query produces incorrect result with child wildcards [LUCENE-8902] #899

Closed mikemccand closed 5 years ago

mikemccand commented 5 years ago

When I do a index-time join query on certain parent docs with a wildcard query for child docs, sometimes I get the wrong answer. Example:

 

Parent Doc Children
id=id00000       none
id=id00001 1. program=P1
id=id00002 1. program=P1
1. program=P2
id=id00003       none
id=id00004 1. program=P1
id=id00005 1. program=P1
1. program=P2

So essentially I have 6 parent docs, doc 0 has no children, doc 1 has 1 child, doc 2 has 2 children, etc.

  1. The following query gives the correct results:

        BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00002", "id00003", "id00004", "id00005")));         Query q = new ToParentBlockJoinQuery(new TermInSetQuery("program", toSet("P1", "P2")), parentSet, ScoreMode.None);

Returns the correct result (4 docs: ["id00001", "id00002", "id00004", "id00005"]

 

  1. This also gives correct result (same as above):

        BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00002", "id00003", "id00004", "id00005")));         Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, ScoreMode.None);

 

  1. Also correct (same as above)

        BitSetProducer parentSet = new QueryBitSetProducer(new WildcardQuery(new Term("id", "*")));         Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, ScoreMode.None);

so far so good.

 

  1. This one gives incorrect result:

        BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00003")));         Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);

Returns 2 docs ["id00001", "id00003"]. It should only return "id00001" and not "id00003" here. Very strange behavior.

 

  1. Just asking for "id00003" also incorrectly returns it:

        BitSetProducer parentSet = new QueryBitSetProducer(new TermQuery(new Term("id", "id00003")));         Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);

 

  1. But as soon as I add "id00002" to the parent query, it works again..

        BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet( "id00003", "id00002")));         Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);

Gives the correct result ["id00002"]


I am attaching the unit test that demonstrates this: https://pastebin.com/aJ1LDLCS

I don't know if I am doing something wrong, or if there is an issue.

Thank you for looking into it.


Legacy Jira details

LUCENE-8902 by Andrei on Jul 02 2019, resolved Jul 03 2019

mikemccand commented 5 years ago

Returns 2 docs ["id00001", "id00003"]. It should only return "id00001" and not "id00003" here. Very strange behavior.

  1. Not at all. Child query matches id00002's children, but since it's absent in parent mask, it lands on the next bit, which it id00003.
  2. I don't think child free is supported, although I don't remember why
  3. not having the last segment doc in parent mask (id=id00005) should cause an exception IIRC.
  4. please obey jira usage rules, come to mailing list first

[Legacy Jira: Mikhail Khludnev (@mkhludnev) on Jul 03 2019]

mikemccand commented 5 years ago

My apologies. Thank you and I will pot something to the mailing list.

[Legacy Jira: Andrei on Jul 03 2019]