Open Lucaterre opened 2 years ago
Hello @Lucaterre
Thanks for the issue.
Yes we can do this, so have plain text or the mediawiki format for the definition field which is set by a query parameter. The plain text method already exist:
Thank you for your answer ! Oh ok nice for a ready method :)
This is the idea indeed, to clarify my issue a little more (but I think that's what you said).
We consider an optional parameter query "plain_text" (maybe it's not the best param name here) set to "false" by default and which returns the definition in mediawiki format in the response.
Now if we imagine a request, such as:
$ curl 'https://cloud.science-miner.com/nerd/service/kb/concept/Q90?lang=fr?plain_text=true'
the response return a plain text definition instead of the definition in mediawiki format.
I don't know if there is any interest in keep both definitions (plain text and mediawiki) in the same response, it depends on the use case? (that's an open question)
what about something like this:
$ curl 'https://cloud.science-miner.com/nerd/service/kb/concept/Q90?lang=fr&definition=mediawiki'
the definition
parameter name is more precise for the expected behavior, as well as a non boolean value (which could be mediawiki
(default), plain_text
or maybe another one in the future). Maybe definition_format
rather than definition
?
I am agree, it seems definition_format
is fine and more explicit as a parameter name than definition
alone (which is confusing: the user may think that retrieving the definition is optional with this last name parameter).
Ok, with mediawiki
as the default option of the parameter (this seems normal, this is the original format for the definition).
Just curious, what other "cross-mediawiki" formats do you think of in the future? HTML, Markdown for example?
Just curious, what other "cross-mediawiki" formats do you think of in the future? HTML, Markdown for example?
yes I was thinking of these two possible formats.
This is implemented with 2557847d086181fc900db1aa9182b1f1f19504cf
REST API parameter is definitionFormat
with value Mediawiki
(default) or PlainText
(as requested in this issue). I am using Java notation for the parameter, because we are in the Java world in this project.
Example:
curl -X GET http://localhost:8090/service/kb/concept/Q190712?definitionFormat=PlainText
{ "rawName" : "First Battle of the Marne", "preferredTerm" : "First Battle of the Marne", "confidence_score":0, "wikipediaExternalRef":171325, "wikidataId" : "Q190712", "definitions" : [ { "definition" : "The First Battle of the Marne was a battle of the First World War fought from 5 to 12 September 1914. It was fought in a collection of skirmishes around the Marne River Valley. It resulted in an Entente victory against the German armies in the west. The battle was the culmination of the Retreat from Mons and pursuit of the Franco-British armies which followed the Battle of the Frontiers in August and reached the eastern outskirts of Paris.", "source" : "wikipedia-en", "lang" : "en" } ] ... }
https://nerd.readthedocs.io/en/latest/restAPI.html#get-kb-concept-id
Also added html
as format:
curl -X GET http://localhost:8090/service/kb/concept/Q190712?definitionFormat=html
{ "rawName" : "First Battle of the Marne", "preferredTerm" : "First Battle of the Marne", "confidence_score":0, "wikipediaExternalRef":171325, "wikidataId" : "Q190712", "definitions" : [ { "definition" : "<p>The <b>First Battle of the Marne</b> was a battle of the <a href=\"https://en.wikipedia.org/wiki/First_World_War\" title=\"First World War\">First World War</a> fought from 5 to 12 September 1914. It was fought in a collection of skirmishes around the Marne River Valley. It resulted in an <a href=\"https://en.wikipedia.org/wiki/Allies_of_World_War_I\" title=\"Allies of World War I\">Entente</a> victory against the <a href=\"https://en.wikipedia.org/wiki/German_Army_(German_Empire)\" title=\"German Army (German Empire)\">German</a> armies in the west. The battle was the culmination of the <a href=\"https://en.wikipedia.org/wiki/Retreat_from_Mons\" title=\"Retreat from Mons\">Retreat from Mons</a> and pursuit of the Franco-British armies which followed the <a href=\"https://en.wikipedia.org/wiki/Battle_of_the_Frontiers\" title=\"Battle of the Frontiers\">Battle of the Frontiers</a> in August and reached the eastern outskirts of Paris.<p>", "source" : "wikipedia-en", "lang" : "en" } ] ... }
Hello @kermitt2 ,
I leave this feature/proposal here. Sorry, in advance if I used a wrong terminology (preprocess/clean etc.).
Currently, the query endpoint
kb/concept/
returns the concept definition with a "Wikimedia" style markup.output example for concept "Victor Hugo" :
A definition without specific markup, for example (Cf. https://en.wikipedia.org/wiki/Victor_Hugo) :
I don't know if this is complicated to implement, but it could be considered in two different ways:
1) the user has the choice to retrieve a "clean" definition by adding an optional parameter, for example, something like:
"raw":"true"
or"clean":"true"
for the kb/concept endpoint2) In the answer add a "definition_raw" key (with wikimedia markup) and a "definition_clean" key (without markup)
I think it could be useful for people who need to work on additional features, here the definition, from the entities, without going through the addition of a textual preprocessing function.
What do you think about that ?
Regards, Lucas Terriel