semantic-systems / NLIWOD

Collection of tools, utilities, datasets and approaches towards realising natural language interfaces for the Web of Data.
GNU Affero General Public License v3.0
94 stars 32 forks source link

Ignoring unknown fields in QALD #69

Closed nikit-srivastava closed 2 years ago

nikit-srivastava commented 2 years ago

Hi @RicardoUsbeck, Is it fine if we just ignore the unknown fields in the QALD format?

I recently faced a problem where the query results from the dbpedia endpoint introduced new fields which were not part of the QALD format previously. However, these newly introduce fields do not affect the existing expected functionality and can be ignored. Below is the stacktrace for the problem:

Caused by: com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "distinct" (class org.aksw.qa.commons.load.json.EJResults), not marked as ignorable (one known property: "bindings"])
 at [Source: (FileInputStream); line: 23, column: 42] (through reference chain: org.aksw.qa.commons.load.json.QaldJson["questions"]->java.util.Vector[0]->org.aksw.qa.commons.load.json.QaldQuestionEntry["answers"]->java.util.Vector[0]->org.aksw.qa.commons.load.json.EJAnswers["results"]->org.aksw.qa.commons.load.json.EJResults["distinct"])
    at com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:823) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:1153) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1589) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1567) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:294) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:151) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeAndSet(MethodProperty.java:129) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:288) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:151) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:286) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:245) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:27) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeAndSet(MethodProperty.java:129) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:288) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:151) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:286) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:245) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:27) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeAndSet(MethodProperty.java:129) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:288) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:151) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4014) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3071) ~[jackson-databind-2.9.10.5.jar:2.9.10.5]
    at org.aksw.qa.commons.load.json.ExtendedQALDJSONLoader.readJson(ExtendedQALDJSONLoader.java:115) ~[commons-0.4.22.jar:0.4.22]
    at org.aksw.qa.commons.load.json.ExtendedQALDJSONLoader.readJson(ExtendedQALDJSONLoader.java:143) ~[commons-0.4.22.jar:0.4.22]
    at org.aksw.qa.commons.load.json.ExtendedQALDJSONLoader.readJson(ExtendedQALDJSONLoader.java:137) ~[commons-0.4.22.jar:0.4.22]
    at org.aksw.gerbil.dataset.impl.qald.FileBasedQALDDataset.init(FileBasedQALDDataset.java:95) ~[classes/:?]
    at org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getPreparedDataset(AbstractDatasetConfiguration.java:87) ~[classes/:?]
    at org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:74) ~[classes/:?]
    at org.aksw.gerbil.annotator.InstanceListBasedConfigurationImpl.loadAnnotator(InstanceListBasedConfigurationImpl.java:69) ~[classes/:?]
    at org.aksw.gerbil.annotator.InstanceListBasedConfigurationImpl.getAnnotator(InstanceListBasedConfigurationImpl.java:50) ~[classes/:?]
    ... 5 more

What is your opinion of this?

RicardoUsbeck commented 2 years ago

Huh, can you show me the json line where it happens? I wasn't aware of a new field (/CC @xixi019 )

RicardoUsbeck commented 2 years ago

I do not know if this is it but these, Xi, should also be removed from https://raw.githubusercontent.com/KGQA/QALD_10/main/data/qald_10/qald_10.json

        {
            "id": 42,
            "aggregation": false,
            "question": [
                {
                    "language": "en",
                    "string": "How long did the Han dynasty last?"
                },
                {
                    "language": "zh-cn",
                    "string": "汉朝持续了多长时间?"
                },
                {
                    "language": "de",
                    "string": "Wie lange bestand die Han-Dynastie?"
                },
                {
                    "language": "ru",
                    "string": "Как долго просуществовала династия Хань ?"
                }
            ],
            "answers": [
                {
                    "head": {
                        "vars": [
                            "result"
                        ]
                    },
                    "results": {
                        "bindings": [
                            {}
                        ]
                    }
                }
            ],
            "query": {
                "sparql": "SELECT ?result WHERE {wd:Q7209 wdt:P571 ?st; wdt:P576 ?et. BIND((?et - ?st) AS ?result)}"
            }
        },
nikit-srivastava commented 2 years ago

Huh, can you show me the json line where it happens? I wasn't aware of a new field (/CC @xixi019 )

For example (do not mind the wrongly predicted sparql ;) ):

{
    "id": 71,
    "question": [
        {
            "language": "en",
            "string": "How many students does the Free University of Amsterdam have?"
        }
    ],
    "query": {
        "sparql": "SELECT (COUNT(DISTINCT ?selvar) AS ?c) WHERE { <http://dbpedia.org/resource/Amsterdam_University_College> <http://dbpedia.org/ontology/country> ?selvar .  }"
    },
    "answers": [
        {
            "head": {
                "link": [],
                "vars": [
                    "c"
                ]
            },
            "results": {
                "distinct": false,
                "ordered": true,
                "bindings": [
                    {
                        "c": {
                            "type": "typed-literal",
                            "datatype": "http://www.w3.org/2001/XMLSchema#integer",
                            "value": "1"
                        }
                    }
                ]
            }
        }
    ]
}
RicardoUsbeck commented 2 years ago

Ah, that happens in the training file (/CC @Perevalov )I would say, we can safely ignore it. Aleksandr, what do you think?

Perevalov commented 2 years ago

Ah, that happens in the training file (/CC @Perevalov )I would say, we can safely ignore it. Aleksandr, what do you think?

From the example by @nikit91, I understood that the:

                "distinct": false,
                "ordered": true,

and

"datatype": "http://www.w3.org/2001/XMLSchema#integer",

is something that should be ignored?

If yes, then I agree. It does not affect anything.

RicardoUsbeck commented 2 years ago

Perfect, so let's go ahead here and resolve the QALD-10 test issue (empty answers in the repo).

MichaelRoeder commented 2 years ago

It would be nice to have this fix for GERBIL QA. :wink: Do you foresee to release and deploy a new version of qa.commons, soon? I would also be fine with a SNAPSHOT version. Where will you deploy it? On the AKSW archiva or somewhere else? :thinking:

RicardoUsbeck commented 2 years ago

Erm...no, I am not foreseeing working on this repo anytime soon actually. But as far as I know, you could also mvn deploy it to archiva (honestly, I do not even have the keys anymore due to laptop changes)