jgontrum / spacy-api-docker

spaCy REST API, wrapped in a Docker container.
https://hub.docker.com/r/jgontrum/spacyapi/
MIT License
265 stars 99 forks source link

Sentence Boundary Detection? #6

Closed roschler closed 5 years ago

roschler commented 7 years ago

First, thank you for creating this REST API capabile Docker container for Spacy. I have set up a Spacy server using it and I am successfully able to get parse trees for sentences I submit over the REST API.

I would like to able to do sentence boundary detection too. Is there a way to use the Docker container to do that? If not, how hard would it be for me to enhance the REST API to be able to do that too? I'm an experienced C/C++ and Javascript programmer of many years, with about a year of Python experience too.

jgontrum commented 7 years ago

I saw a twitter conversation today, where Matthew wrote that sentence boundary detection (that does not require dependency parsing) is on the roadmap for spaCy v2. (https://twitter.com/honnibal/status/860395803826937856).

But if you have a Python module that detects sentence boundaries on a raw text, It would be no problem to integrate this in the API as separate entry point.

roschler commented 7 years ago

Hi Johannes,

Thanks for replying. I'm a bit confused. I thought the current version of spaCy already does sentence boundary detection?:

https://github.com/explosion/spaCy/issues/23

"Improved sentence segmentation now included in the latest release. Docs are updated with usage."

I'd like to be able to submit an entire document to the server and have it return a JSON array of all segmented sentences.

ghost commented 6 years ago

Same for me. /dep will return a list of annotated tokens but it would be nice to have something like this:

{
  'sentence': [
    {'index': 0, 'dep': [...]},
    {'index': 1, 'dep': [...]},
  ]
}

Similar to the JSON result that's provided by the Stanford CoreNLP Server.

jgontrum commented 6 years ago

I agree this is probably a better a format when working with the output. However, I'm currently not using this API in any of my projects, so development has a rather low priority for me.

But if you want to change the format, have a look at this part: https://github.com/jgontrum/spacy-api-docker/blob/78585e8b447449071069ae5600ade51534f010b4/displacy_service/parse.py#L27-L49

I'm always happy to receive pull requests if you make any changes :)