tesserae / apitess

Tesserae API
0 stars 4 forks source link

Ingest Endpoint #32

Closed jeffkinnison closed 4 years ago

jeffkinnison commented 4 years ago

For an ingest endpoint, I think we need the following features:

At least as far as I've been able to trace out, this is the minimum we should need. Can you think of anything else?

Edit: For the placement, I think we could make this a POST request to the /texts/ endpoint. How does that sound?

Edit 2: Do we need to include multiple ingest up to a certain number of texts at a time?

nOkuda commented 4 years ago

For the moment, I think we should limit ingestion to one text at a time. If we don't, we'll run into Feature index collisions, and that would really mess up search.

My plan is to augment POST on /texts/ (https://tess-new.caset.buffalo.edu/docs/api/endpoints/texts/#post) so that it starts ingestion in another thread, but I'm not sure how to limit only one ingestion at a time. We can hard code the job queue to have only one worker, but that doesn't prevent apache from spawning multiple instances of the server (despite configuration that tells apache to spawn only one instance).

I've already added a new "ingestion_complete" flag on the Text entities, so that should be helpful to this endeavor. But it isn't conducive to displaying error messages. I'd have to think a little more on how to make that work.

jeffkinnison commented 4 years ago

We should update the endpoint to accept the .tess file as a file instead of a string. That's the standard for accepting files.

jeffkinnison commented 4 years ago

I can't reopen this, sadly.