Open AlJohri opened 4 years ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity.
Is your feature request related to a problem? Please describe.
The way that CoreNLP server is encapsulated using a subprocess is really nice. I want to be able to use that code for booting the server separate from the client that actually hits the service.
Similarly, the code for setting up properties, parsing the response, and actual making the request are all tied together right now which makes it very hard to directly use the library in an async manner.
Using synchronous requests performs abysmally slow for me- I'm trying to annotate some 40k documents which would take 11 hours synchronously and just under 2 hours async using a server with 16 threads.
Describe the solution you'd like
While I wrote a simple solution for my specific case (attached below) using async, it required booting a separate server instance and also copying bits of logic out of
client.py
such asparseFromDelimitedString
.My feature requests are as below:
1) centralize the
properties
creation logic separate from anyrequests
call. it took a long time to understand how it all works together to make sure I was faithfully replicating it. there also isn't any option to see what the finalproperties
looks like when making the API call which would also make things easier. I ended up just reading the docs for each annotator instead.2) completely separate the server and client (i.e the code that boots up the java server vs. the code that hits the java server). allow a separate class for booting a server which I can use with my own custom client
3) separate the logic for parsing the response from actually making the request. this is actually almost there already, I just wish there was a function that encapsulated the creating a
Document
and then running theparseFromDelimitedString
stepHere's my async version below with uses a semaphore to limit max concurrency:
I'd be happy to talk further and help make some of these changes if you're interested. Thanks for reading!