medic / cht-core

The CHT Core Framework makes it faster to build responsive, offline-first digital health apps that equip health workers to provide better care in their communities. It is a central resource of the Community Health Toolkit.
https://communityhealthtoolkit.org
GNU Affero General Public License v3.0
467 stars 217 forks source link

Exporting messages crashes API #2694

Closed estellecomment closed 7 years ago

estellecomment commented 8 years ago

Tried for a couple minutes to export 878 messages, then API ran out of memory and crashed. (I can export forms and contacts just fine. This is on a clone of CA2, upgraded.)

::1 - admin [01/Sep/2016:08:49:09 +0000] "POST /api/v1/export/messages?format=xml&locale=en HTTP/1.1" - - "http://localhost:5988/medic/_design/medic/_rewrite/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"

<--- Last few GCs --->

  245969 ms: Scavenge 1397.8 (1455.9) -> 1397.8 (1455.9) MB, 1.4 / 0 ms (+ 15.4 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
  250628 ms: Mark-sweep 1397.8 (1455.9) -> 1397.8 (1455.9) MB, 4659.0 / 0 ms (+ 30.0 ms in 2 steps since start of marking, biggest step 15.4 ms) [last resort gc].
  256098 ms: Mark-sweep 1397.8 (1455.9) -> 1397.8 (1455.9) MB, 5470.2 / 0 ms [last resort gc].

<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0xfa186d44a49 <JS Object>
    1: constructor(aka XMLNode) [/Users/estellecomment/dev/medic/medic-webapp-2/api/node_modules/xmlbuilder/lib/XMLNode.js:~25] [pc=0x3019c6cf36ac] (this=0x154bddad3949 <an XMLText with map 0x353c70064921>,parent=0x154bddad3989 <an XMLElement with map 0x353c70063d19>)
    2: text [/Users/estellecomment/dev/medic/medic-webapp-2/api/node_modules/xmlbuilder/lib/XMLNode.js:~163] [pc=0x3019c6cfdf5b] ...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Abort trap: 6
SCdF commented 8 years ago

Related to #2692 ?

estellecomment commented 8 years ago

Yeah, similar, but #2692 is being solved by exporting less logs, which won't work here.

SCdF commented 8 years ago

I'm going to take a crack at solving this with streaming

SCdF commented 8 years ago

@estellecomment I've created https://github.com/medic/medic-api/pull/99 as a POC for streaming XML results. It would be good if you could test it locally to make sure that reduces the memory usage enough for that DB you have.

Also, how exactly were you testing it? Through a UI, or curling api?

estellecomment commented 8 years ago

Through curl :

UNCAUGHT EXCEPTION!
Server error:
 { [TypeError: this._implicitHeader is not a function]
  domain: 
   Domain {
     domain: null,
     _events: { error: [Function] },
     _eventsCount: 1,
     _maxListeners: undefined,
     members: [] },
  domainThrown: true }
undefined

So you'll need to fix format=xml to get my before-after input! (I'm getting 91301 messages, that's a lot of messages)

Edit : same result with GET. I'm using POST because that's what webapp's Export page is using.

SCdF commented 8 years ago

Interesting, so that's a bug in the XML framework that isn't happening when we run the tests, even though it's going through the same code paths…

Server error:
 { TypeError: this._implicitHeader is not a function
    at XMLDocumentCB.OutgoingMessage.write [as onDataCallback] (_http_outgoing.js:433:10)
    at XMLDocumentCB.module.exports.XMLDocumentCB.onData (/Users/scdf/Code/Medic/medic-api/node_modules/xmlbuilder/lib/XMLDocumentCB.js:302:19)
    at XMLDocumentCB.module.exports.XMLDocumentCB.declaration (/Users/scdf/Code/Medic/medic-api/node_modules/xmlbuilder/lib/XMLDocumentCB.js:190:12)
    at XMLDocumentCB.module.exports.XMLDocumentCB.dec (/Users/scdf/Code/Medic/medic-api/node_modules/xmlbuilder/lib/XMLDocumentCB.js:335:19)
    at /Users/scdf/Code/Medic/medic-api/controllers/export-data.js:454:8

I'll dig into it, seems weird

SCdF commented 8 years ago

OK wow check this out: https://github.com/medic/medic-api/pull/99/commits/37c2de1eb7496b58aa15b6da9fd6217ad05d93b3

One of those times I realise I don't really understand JavaScript at all… anyway @estellecomment give that another crack, it now works for me locally.

estellecomment commented 8 years ago

Booooom But much further along the process. Got 59500 / 91301 rows. screen shot 2016-09-02 at 15 18 08

::1 - admin [02/Sep/2016:13:14:13 +0000] "GET /api/v1/export/messages?format=xml HTTP/1.1" - - "-" "curl/7.43.0"

<--- Last few GCs --->

  224276 ms: Scavenge 1398.0 (1455.3) -> 1398.0 (1455.3) MB, 0.8 / 0 ms (+ 2.3 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
  225320 ms: Mark-sweep 1398.0 (1455.3) -> 1398.0 (1455.3) MB, 1044.4 / 0 ms (+ 3.6 ms in 2 steps since start of marking, biggest step 2.3 ms) [last resort gc].
  226365 ms: Mark-sweep 1398.0 (1455.3) -> 1398.0 (1455.3) MB, 1045.1 / 0 ms [last resort gc].

<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x206170a44a49 <JS Object>
    1: replace(aka replace) [native string.js:~152] [pc=0x1d88423c0e18] (this=0x26470193b691 <String[29]: 30, Dec 2015, 19:56:23 +01:00>,P=0x2a53d7cb6469 <JS RegExp>,Q=0x1f3851620d99 <String[5]: &amp;>)
    2: text [/Users/estellecomment/dev/medic/medic-webapp-2/api/node_modules/xmlbuilder/lib/XMLDocumentCB.js:~121] [pc=0x1d8842374ae3] (this=0x1a2f5d3377c9 <an XMLDocumentCB with map 0x388456048...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Abort trap: 6
SCdF commented 8 years ago

OK so I'll move on to the next optimisation, which will be batching view calls to CouchDB instead of getting everything in one big go.

mandric commented 8 years ago

Hrm, it should be streaming from the couchdb list function and we can just proxy that?

mandric commented 8 years ago

I might be hallucinating about old code...

SCdF commented 8 years ago

Yeah I think you might be. The existing stuff in master very linearly does this:

Batching the queries is going to be interesting, because of lucene being a potential option there as well.

mandric commented 8 years ago

Hrm this might be a job for a couchdb list function, and stream results where each row can be parsed... as opposed to using a view?

mandric commented 8 years ago

Maybe add a list function to stream CSV from couchdb and transform it in medic-api.

SCdF commented 8 years ago

Why? As in, what is the benefit of doing this over simply calling the view again and again with a different offset?

mandric commented 8 years ago

Might be more straightforward code wise and more efficient, seems like something the database should provide.

mandric commented 8 years ago

Just a suggestion I thought worth exploring if batching is difficult.

SCdF commented 8 years ago

FWIW this is basically exactly what we want: https://github.com/dominictarr/JSONStream

The downside is that some of our export generation is more complicated than I thought, and I'm not actually convinced they can all be streamed, at least not without quite a bit of re-writing. Still investigating.

SCdF commented 8 years ago

i.e. the forms export has an overview tab that requires mapping over the whole view result first, and should be the "first" XML result to be exported (ie you're streaming out tab by tab, the summary tab is first).

SCdF commented 8 years ago

@estellecomment can you pull that branch and give your test command another crack? I very much doubt I've fixed it but actually fixing it looks like a larger piece of work than I was expecting, so I thought I'd try a silly random hack (that language optimisers hate!)

estellecomment commented 8 years ago

Nope! Exploded at 62.6MB again. 59496 / 91301 messages.

$ curl -X GET  http://admin:pass@localhost:5988/api/v1/export/messages?format=xml > tmp3.xml
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 62.6M    0 62.6M    0     0   274k      0 --:--:--  0:03:53 --:--:-- 10290curl: (18) transfer closed with outstanding read data remaining

Api says :

::1 - admin [05/Sep/2016:15:53:22 +0000] "GET /api/v1/export/messages?format=xml HTTP/1.1" - - "-" "curl/7.43.0"

<--- Last few GCs --->

  295196 ms: Scavenge 1398.1 (1455.3) -> 1398.1 (1455.3) MB, 0.8 / 0 ms (+ 2.3 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
  296182 ms: Mark-sweep 1398.1 (1455.3) -> 1398.1 (1455.3) MB, 986.0 / 0 ms (+ 3.6 ms in 2 steps since start of marking, biggest step 2.3 ms) [last resort gc].
  297161 ms: Mark-sweep 1398.1 (1455.3) -> 1398.0 (1455.3) MB, 979.5 / 0 ms [last resort gc].

<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x249aeb244a49 <JS Object>
    1: /* anonymous */(aka /* anonymous */) [/Users/estellecomment/dev/medic/medic-webapp-2/api/controllers/export-data.js:485] [pc=0x152e26a3c30e] (this=0x249aeb204139 <undefined>,cell=0x22b7226dd129 <String[32]: 4d803109b977d8a88350eadee42d56da>)
    2: arguments adaptor frame: 3->1
    3: InnerArrayForEach(aka InnerArrayForEach) [native array.js:~942] [pc=0x152e26ad5cc5] (this=0x249aeb204139...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Abort trap: 6
SCdF commented 8 years ago

So talked to Alex and Gareth about this one: it's quite a lot of effort to re-write the API cleanly to stream from A->Z, which is really the only reliable way of fixing this issue without caveats and hacks.

These APIs are very "inherited", in the sense that it's not clear how many of the various file formats and output types are being actively used. No client has complained about these exports crashing, so you could conclude that in the field they work well enough, and for people where they wouldn't they aren't used.

So: I'm going to merge what I did: since it will help with reducing memory usage in general, and we're going to leave it at that for now. If a client has trouble in the field we can solve the issue temporarily by increasing the allowed nodejs memory usage, and use their issue as an opportunity to look at this in more detail. I'd recommend making a "2.0" api that supports streaming from the start, and moving things over to that model that are provably used.

nice-snek commented 7 years ago

hi frand @browndav, U is coot

Please close or schedule before the end of this sprint. See triaging old issues.

Nice Snek

ghost commented 7 years ago

Closing.

mandric commented 7 years ago

I still don't understand why medic-api can't simply stream the output from the list function in couchdb that is already streaming. Originally this worked just fine because we called the couchdb list function directly, it only started to fail when we added the medic-api layer incorrectly (not streaming).