jkkummerfeld / text2sql-data

A collection of datasets that pair questions with SQL queries.
http://jkk.name/text2sql-data/
Other
534 stars 105 forks source link

SPARQL for Question Answering #58

Closed Abdelrahman-Elgoharyy closed 4 months ago

Abdelrahman-Elgoharyy commented 4 months ago

The approach of making dual predictions on the choice of template and the slots to be filled within that template offered valuable insights into handling structured query language translations.

I have been attempting to extend its application to SPARQL datasets within my own project focused on converting text to SPARQL queries for knowledge representation. However, I encountered difficulties in adapting the methodology for SPARQL.

Are there any adaptations or modifications you would recommend for effectively generating SPARQL templates from training sets,? this is the dataset I'm working on SPARQL.json

jkkummerfeld commented 4 months ago

Thanks for the positive comment!

On that adaptation, looking at the file, one challenge is that the variables in the question text are not identified. That is necessary in order to be able to train the tagging part of the model. Glancing at the code, it seems like it may be possible to automatically do that mapping, but there will probably be some noise.

Once the data issue is resolved, I don't see any reason why the model shouldn't apply just fine.

I'm going to close this, but please reopen if you find issues we should fix!

Abdelrahman-Elgoharyy commented 4 months ago

Thank you once again for your valuable feedback on the dataset adaptation. I wanted to provide an update on the steps I've taken since our last correspondence

In response to your suggestion about identifying variables in the question text, I made efforts to address this by creating another dataset that includes the variables part. Despite these adjustments, I encountered a significant challenge during evaluation—the model consistently yields zero percent accuracy

I've thoroughly reviewed the dataset and the evaluation process to identify any potential issues or inconsistencies. However, despite my best efforts, I haven't been able to pinpoint the exact cause of the problem

here is the used dataset SPARQLedit.json

jkkummerfeld commented 4 months ago

My best guess is that our code makes some other assumptions about the structure of the data. I don't have time to dig into that. My suggestion would be to try running on a tiny subset and printing information at many points in the code.

Sorry, I'm not able to help further to resolve this. Good luck with the project!