Open ivbeg opened 2 years ago
Hi Ivan,
glad to hear that you liked our work.
What I can share right now is a list of ~600k URLs that point to raw .SQL files on GitHub. I've uploaded it here.
We used pglast for parsing, which succeeded for about 15% of the files. A great project which recently helped us to achieve a ~50% success rate on the same corpus is simple-ddl-parser - it's worth having a look at this project if you haven't already.
Developing a more robust or universal CSV parser is certainly a very interesting topic. If you are interested to discuss this in more detail, feel free to reach out to me by email (t.r.dohmen at uva.nl).
Best, Till
@tdoehmen, thanks for sharing! I didn't know about simple-ddl-parser; it could be helpful. Yes, I consider the development/adaptation of the existing SQL parser. My goal is a bit different, I have huge SQL dumps sometimes and I would like to convert them to CSV/JSONl without an RDBMS instance and I am working on several cmd tools and data engineering projects with many SQL, CSV, and other data and schema file types.
But universal SQL and CSV parsers are a very interesting topic to me too. I will email you after some tests over the dataset of SQL files.
Hi Till!
Great project! I am impressed by the dataset and research paper. Could you please add a raw dataset of collect SQL files, including parsed and unparsed SQL files? It could be helpful for future research and the development of universal SQL parsing tools.
Best Regards, Ivan