zzjjyyy / QuerySplit

3 stars 3 forks source link

The same error reproduced from TempKmkim #3

Open Leesangoh opened 4 months ago

Leesangoh commented 4 months ago

Hello! I am really inspired by your work.

The same problem as TempKmkim (i.e., even using the provided docker zhaojy20/querysplit:v1) occured to me. I encountered that JOB-9c query produce nothing (not even 0 tuple, literally nothing) when I do not use minsubquery option.

It could be correct that it is a difference between Window compiler and Linux compiler as you said, however, I am currently doubting about the difference between dataset (schema, loaded csv files etc...) and queries.

It would be great if you could provide such datasets (especially, for IMDB-JOB) that you have used! (I used the IMDB-JOB datset from the origina JOB github here: https://github.com/gregrahn/join-order-benchmark)

If we have used the same schema / csv / queries, then I have really no idea why this issue occurs ... :'(

zzjjyyy commented 4 months ago

Do you claim the foreign key constraint on the foreign key columns, like "ALTER TABLE title ADD FOREIGN KEY (kind_id) REFERENCES kind_type(id);"? Query split (i.e., Relationshipcenter and Entitycenter) needs to recognize the foreign key constraint to split the original query, if you don't claim the constraint the query split will return nothing.

zzjjyyy commented 4 months ago

And my dataset is set up by the Python script from the original JOB Git Hub. So I think we use the same dataset.

Leesangoh commented 4 months ago

Aha thanks! I did not claim the foreign key constraint, just primary key constraints (as specified in here: https://github.com/gregrahn/join-order-benchmark/blob/master/schema.sql

To produce the exact result (not repeating the issues again), would you kindly share the commands to claim foreign key constraints on foreign key columns?

Thank you for your kindness!!

zzjjyyy commented 4 months ago

ALTER TABLE aka_name ADD FOREIGN KEY (person_id) REFERENCES name(id);

ALTER TABLE aka_title ADD FOREIGN KEY (kind_id) REFERENCES kind_type(id), ADD FOREIGN KEY (episode_of_id) REFERENCES aka_title(id); -- ADD FOREIGN KEY (movie_id) REFERENCES title(id);

ALTER TABLE cast_info ADD FOREIGN KEY (person_id) REFERENCES name(id), ADD FOREIGN KEY (movie_id) REFERENCES title(id), ADD FOREIGN KEY (person_role_id) REFERENCES char_name(id), ADD FOREIGN KEY (role_id) REFERENCES role_type(id);

ALTER TABLE complete_cast ADD FOREIGN KEY (movie_id) REFERENCES title(id), ADD FOREIGN KEY (subject_id) REFERENCES comp_cast_type(id), ADD FOREIGN KEY (status_id) REFERENCES comp_cast_type(id);

ALTER TABLE movie_companies ADD FOREIGN KEY (movie_id) REFERENCES title(id), ADD FOREIGN KEY (company_id) REFERENCES company_name(id), ADD FOREIGN KEY (company_type_id) REFERENCES company_type(id);

ALTER TABLE movie_keyword ADD FOREIGN KEY (movie_id) REFERENCES title(id), ADD FOREIGN KEY (keyword_id) REFERENCES keyword(id);

ALTER TABLE movie_info ADD FOREIGN KEY (movie_id) REFERENCES title(id), ADD FOREIGN KEY (info_type_id) REFERENCES info_type(id);

ALTER TABLE movie_link ADD FOREIGN KEY (movie_id) REFERENCES title(id), ADD FOREIGN KEY (link_type_id) REFERENCES link_type(id), ADD FOREIGN KEY (linked_movie_id) REFERENCES title(id);

ALTER TABLE person_info ADD FOREIGN KEY (person_id) REFERENCES name(id), ADD FOREIGN KEY (info_type_id) REFERENCES info_type(id);

ALTER TABLE movie_info_idx ADD FOREIGN KEY (info_type_id) REFERENCES info_type(id), ADD FOREIGN KEY (movie_id) REFERENCES title(id);

ALTER TABLE title ADD FOREIGN KEY (kind_id) REFERENCES kind_type(id);

zzjjyyy commented 4 months ago

I think that is all the foreign key constraints that need.

Leesangoh commented 4 months ago

I found that with the docker image you gave and this foreign key constraints, the querysplit seems work well. Thanks a lot!

Leesangoh commented 4 months ago

Oh, by the way, with the linux docker you gave and foreign key constraints, i found that for the following five JOB-queries, the error "ERROR: no relation entry for relid 0" appears. Do you know about these issues? I am struggling to figure out the bugs, but it is quite hard for me :'(

SELECT MIN(lt.link) AS link_type, MIN(t1.title) AS first_movie, MIN(t2.title) AS second_movie FROM keyword AS k, link_type AS lt, movie_keyword AS mk, movie_link AS ml, title AS t1, title AS t2 WHERE k.keyword ='10,000-mile-club' AND mk.keyword_id = k.id AND t1.id = mk.movie_id AND ml.movie_id = t1.id AND ml.linked_movie_id = t2.id AND lt.id = ml.link_type_id AND mk.movie_id = t1.id;

SELECT MIN(lt.link) AS link_type, MIN(t1.title) AS first_movie, MIN(t2.title) AS second_movie FROM keyword AS k, link_type AS lt, movie_keyword AS mk, movie_link AS ml, title AS t1, title AS t2 WHERE k.keyword ='character-name-in-title' AND mk.keyword_id = k.id AND t1.id = mk.movie_id AND ml.movie_id = t1.id AND ml.linked_movie_id = t2.id AND lt.id = ml.link_type_id AND mk.movie_id = t1.id;

SELECT MIN(cn1.name) AS first_company, MIN(cn2.name) AS second_company, MIN(mi_idx1.info) AS first_rating, MIN(mi_idx2.info) AS second_rating, MIN(t1.title) AS first_movie, MIN(t2.title) AS second_movie FROM company_name AS cn1, company_name AS cn2, info_type AS it1, info_type AS it2, kind_type AS kt1, kind_type AS kt2, link_type AS lt, movie_companies AS mc1, movie_companies AS mc2, movie_info_idx AS mi_idx1, movie_info_idx AS mi_idx2, movie_link AS ml, title AS t1, title AS t2 WHERE cn1.country_code = '[us]' AND it1.info = 'rating' AND it2.info = 'rating' AND kt1.kind in ('tv series') AND kt2.kind in ('tv series') AND lt.link in ('sequel', 'follows', 'followed by') AND mi_idx2.info < '3.0' AND t2.production_year between 2005 and 2008 AND lt.id = ml.link_type_id AND t1.id = ml.movie_id AND t2.id = ml.linked_movie_id AND it1.id = mi_idx1.info_type_id AND t1.id = mi_idx1.movie_id AND kt1.id = t1.kind_id AND cn1.id = mc1.company_id AND t1.id = mc1.movie_id AND ml.movie_id = mi_idx1.movie_id AND ml.movie_id = mc1.movie_id AND mi_idx1.movie_id = mc1.movie_id AND it2.id = mi_idx2.info_type_id AND t2.id = mi_idx2.movie_id AND kt2.id = t2.kind_id AND cn2.id = mc2.company_id AND t2.id = mc2.movie_id AND ml.linked_movie_id = mi_idx2.movie_id AND ml.linked_movie_id = mc2.movie_id AND mi_idx2.movie_id = mc2.movie_id;

SELECT MIN(cn1.name) AS first_company, MIN(cn2.name) AS second_company, MIN(mi_idx1.info) AS first_rating, MIN(mi_idx2.info) AS second_rating, MIN(t1.title) AS first_movie, MIN(t2.title) AS second_movie FROM company_name AS cn1, company_name AS cn2, info_type AS it1, info_type AS it2, kind_type AS kt1, kind_type AS kt2, link_type AS lt, movie_companies AS mc1, movie_companies AS mc2, movie_info_idx AS mi_idx1, movie_info_idx AS mi_idx2, movie_link AS ml, title AS t1, title AS t2 WHERE cn1.country_code = '[nl]' AND it1.info = 'rating' AND it2.info = 'rating' AND kt1.kind in ('tv series') AND kt2.kind in ('tv series') AND lt.link LIKE '%follow%' AND mi_idx2.info < '3.0' AND t2.production_year = 2007 AND lt.id = ml.link_type_id AND t1.id = ml.movie_id AND t2.id = ml.linked_movie_id AND it1.id = mi_idx1.info_type_id AND t1.id = mi_idx1.movie_id AND kt1.id = t1.kind_id AND cn1.id = mc1.company_id AND t1.id = mc1.movie_id AND ml.movie_id = mi_idx1.movie_id AND ml.movie_id = mc1.movie_id AND mi_idx1.movie_id = mc1.movie_id AND it2.id = mi_idx2.info_type_id AND t2.id = mi_idx2.movie_id AND kt2.id = t2.kind_id AND cn2.id = mc2.company_id AND t2.id = mc2.movie_id AND ml.linked_movie_id = mi_idx2.movie_id AND ml.linked_movie_id = mc2.movie_id AND mi_idx2.movie_id = mc2.movie_id;

SELECT MIN(cn1.name) AS first_company, MIN(cn2.name) AS second_company, MIN(mi_idx1.info) AS first_rating, MIN(mi_idx2.info) AS second_rating, MIN(t1.title) AS first_movie, MIN(t2.title) AS second_movie FROM company_name AS cn1, company_name AS cn2, info_type AS it1, info_type AS it2, kind_type AS kt1, kind_type AS kt2, link_type AS lt, movie_companies AS mc1, movie_companies AS mc2, movie_info_idx AS mi_idx1, movie_info_idx AS mi_idx2, movie_link AS ml, title AS t1, title AS t2 WHERE cn1.country_code != '[us]' AND it1.info = 'rating' AND it2.info = 'rating' AND kt1.kind in ('tv series', 'episode') AND kt2.kind in ('tv series', 'episode') AND lt.link in ('sequel', 'follows', 'followed by') AND mi_idx2.info < '3.5' AND t2.production_year between 2000 and 2010 AND lt.id = ml.link_type_id AND t1.id = ml.movie_id AND t2.id = ml.linked_movie_id AND it1.id = mi_idx1.info_type_id AND t1.id = mi_idx1.movie_id AND kt1.id = t1.kind_id AND cn1.id = mc1.company_id AND t1.id = mc1.movie_id AND ml.movie_id = mi_idx1.movie_id AND ml.movie_id = mc1.movie_id AND mi_idx1.movie_id = mc1.movie_id AND it2.id = mi_idx2.info_type_id AND t2.id = mi_idx2.movie_id AND kt2.id = t2.kind_id AND cn2.id = mc2.company_id AND t2.id = mc2.movie_id AND ml.linked_movie_id = mi_idx2.movie_id AND ml.linked_movie_id = mc2.movie_id AND mi_idx2.movie_id = mc2.movie_id;

zzjjyyy commented 4 months ago

Thank you for your feedback. In my experiment, I didn't test the queries including movie_link, because they return empty results in PostgreSQL. I will try to reproduce the error and fix it.

zzjjyyy commented 4 months ago

Bad news :(. I found my computer with the query_split code cannot be powered up and thus, I have to find a new machine to reproduce the work. As it needs more time, I plan to reproduce and fix the error after I finish my current work. And after I fix the error, I will inform you under this issue. Before that, please ignore the problematic queries. Thank you for your understanding.

Leesangoh commented 4 months ago

Ok, I will wait for your reply!

For you more information, when I conducted experiments with your docker container,

  1. those problematic five queries contain "t2" (second alias of the table "title"), whereas other JOB queries do not.

  2. when I run the QuerySplit without modifying any code, an error " non-MVCC snapshots are not supported in index-only scans " occured for some queries. Maybe if the querysplit is trying to use indexonlyscan operator, the error occurs. (seems like QuerySplit tried to use index on filter column?). I am also struggling with this problem. (please refer to the first figure)

(Example query for 2nd problem: SELECT MIN(t.title) AS american_movie FROM company_type AS ct, info_type AS it, movie_companies AS mc, movie_info AS mi, title AS t WHERE ct.kind = 'production companies' AND mc.note not like '%(TV)%' and mc.note like '%(USA)%' AND mi.info IN ('Sweden', 'Norway', 'Germany', 'Denmark', 'Swedish', 'Denish', 'Norwegian', 'German', 'USA', 'American') AND t.production_year > 1990 AND t.id = mi.movie_id AND t.id = mc.movie_id AND mc.movie_id = mi.movie_id AND ct.id = mc.company_type_id AND it.id = mi.info_type_id;)

  1. Regarding pure PostgreSQL, It is quite strange that PostgreSQL outputs zero tuple. For example, for the second and third image, both querysplit and postgres output 1 tuple (the query contains movie_linke though)

I am sincerely thank you for your kindness!!

image 스크린샷 2024-03-25 오후 2 10 42 스크린샷 2024-03-25 오후 2 11 37
Leesangoh commented 4 months ago

New information regarding 2nd problem:

After setting enable_indexonlyscan=off, the query runs sucessfully. (but still "no relid error: occurs for other queries)

Furthermore, after changing SnapshotAny to InvalidSnapshot in query_split.c (when calling PortalStart), it seems like an error "non-MVCC snapshots are not supported in index-only scans " is resolved, but now it outputs 0 tuple for every query.