Closed DeNeutoy closed 6 years ago
FYI The reason i'm finding these tiny things is because i'm trying to write a grammar which parses all of the datasets - this PR should be fairly self explanatory:
https://github.com/allenai/allennlp/pull/1750/files
Currently my coverage stats look like: academic: 100% advising : 57.8% atis : 93% geography : 99.5% imdb: 99.2% restaurants : 100% yelp : 100% scholar: 98.7%
Ah, I see. That could be really useful, e.g. for one project we discussed but didn't have time for in which structured output is produced by a decoder constrained by an SQL grammar. Also, looking at your code I see the note that JOINs may be hard, which I agree with :)
Looks like this problem stems from bugs in the original data:
SELECT max(*)
FROM
( SELECT count(distinct(movie_0.title))
FROM actor AS actor_0,
CAST AS cast_0,
movie AS movie_0
WHERE actor_0.name = " Brad Pitt "
actor_0.aid = cast_0.aid
AND cast_0.msid = movie_0.mid
GROUP BY movie_0.release_year ) AS subquery ;
I've applied your fix here https://github.com/jkkummerfeld/text2sql-data/commit/97fbaef5d5e27085b635bfb880578b5862622b10
Right, that's what we're planning on doing with it 👍. We're definitely planning on making several of these text2sql datasets part of a "semantic parsing framework" in allennlp, which will be released sometime before the end of the year. So thank you very much for being so open and communicative!
Great! Please feel free to pull data from the data-development branch for now and when your release date is approaching let me know and we will merge data-development into master as a new release.
https://github.com/jkkummerfeld/text2sql-data/blob/master/data/imdb.json#L2442 I think:
Should be: