unfoldingWord-dev / d43-catalog

Lambda functions for the Door43 Catalog.
https://api.door43.org/v3/catalog
MIT License
1 stars 8 forks source link

Convert USFM3 to USFM2 For Backwards Compatibilty #39

Closed jag3773 closed 5 years ago

jag3773 commented 7 years ago

@jag3773 commented on Fri Aug 11 2017

For resources in our catalog that move to use USFM 3 we want to add a converter into the API pipeline that will convert them to USFM2 and add that as another format in the API. Intended result is 2 entries in the formats array, one for the original USFM3 version and one for the stripped down USFM2 version.

The operation should be triggered by the formats key in the manifest being set to text/usfm3. In our ecosystem text/usfm equates to USFM2.

In addition, the USFM2 text can be used for the tS/uW backwards compatibility generators.

da1nerd commented 6 years ago

@jag3773 I think now's the time to start looking into using a queue for processing things in the api. I don't have a good spot to put all of this processing without slowing down and possibly breaking things. I think we need to adjust the webhook to submit requests to a queue and then have a worker process items in the queue.

This is what I propose:

we could limp along with what we have by placing this processing in the webhook, but we're at a crossroads or close to it. We're bound to add more data intensive processing and I'm concerned about timeouts. Also, because of the nature of the webhook I can't use my same pattern for picking up where it left off after timeouts.

da1nerd commented 6 years ago

On the bright side this is the best place in the api to begin implementing a queue pattern. I won't have to touch the tS or uW api code.

da1nerd commented 6 years ago

Let's think about creating a new lambda that will be triggered when the webhook uploads something. This lambda will generate the usfm2 and upload it. The webhook will inject the usfm2 links into the catalog record. The signing and publishing will just fail and restart until this new lambda generates the usfm2 files.

da1nerd commented 6 years ago

Just noting that the usfm3 to usfm2 converter is completed with tests. I've removed any changes made to the existing lambda.

I'll wait to begin constructing a new lambda until we have our meeting regarding the improvements to the api.

da1nerd commented 6 years ago

@jag3773 here's an update on this.

I've set up a repo with code for creating a REST api at https://github.com/unfoldingWord-dev/tx. This includes documentation generation, a pattern for adding new RESTful services, and configuration for tests.

@ethantkoenig and I have come to the conclusion that it would be easiest to simply add the usfm3->2 code to the existing tx pipeline and configure the Door43 API pipeline to monitor an event in the Event Queue. I'd need to do that last step anyway.

That leaves us with a pretty repo that we're not actually going to use. I am quite pleased with it though so perhaps we'll be able to use it later or for something else.

All that said, @jag3773 are you pleased with the direction this is going? Should @ethantkoenig proceed with adding the converter to the existing tx pipeline and I configure coordination with the event queue?

jag3773 commented 6 years ago

Yes, that sounds fine @neutrinog and @ethantkoenig .

da1nerd commented 6 years ago

Here is some code that converts USFM3 to USFM2. At this point I don't think it covers all the new features in usfm3 but it's a starting point. https://github.com/unfoldingWord-dev/d43-catalog/blob/develop/libraries/tools/usfm_utils.py#L338