Configure stage OpenSRP Reveal-SZ connector on nifi.ona.io

gstuder-ona commented 4 years ago

We'll want a place to stage our NiFi flows to unblock work until we've got the real NiFi deployment up.

This also depends on #724 in order to test our flows with realistic sample data, though is not a blocker.

gstuder-ona commented 4 years ago

@Wambere - are you able to give me pointers to the latest OpenSRP connector version?

Wambere commented 4 years ago

Current OpenSRP connector ( still WIP ) is at https://nifi-production.ona.io/nifi/?processGroupId=fff03aac-11e0-1f62-94de-686061c23943&componentIds=
Latest Reveal flow is currently at https://nifi.reveal-stage.smartregister.org/nifi/?processGroupId=0dd515a4-1030-116f-b24b-500cd70672e8&componentIds=
The migrations at https://github.com/onaio/data-solutions/pull/16
The documentation is partly in https://github.com/onaio/canopy/pull/916 and also some in https://github.com/onaio/canopy/pull/987/files

gstuder-ona commented 4 years ago

@moshthepitt in case someone wants a progress report from the other time zone - I've got a flow pushing data to an events table here in a configurable way, but I haven't defined the top-level fields (which become the columns) for anything yet. https://nifi-production.ona.io/nifi/?processGroupId=0c0b3f24-0806-1f61-e219-2c5e2d7d1131&componentIds=

I took what I thought I could from the Reveal side and hammered it into the source/transform/sink model. Also using some auto-json-to-sql code we still had lying around since I suspect we'll need to iterate quickly there.

Basically next steps here are: 1) Copy / invent flattening for clients, jurisdictions, NTD locations, plans, and tasks based on what's in reveal-stage. 1a) Flatten events of different types to different dataTypes (tables) 2) Define migrations for the new table fields 2a) Debug the JSON-to-SQL sink if it goes wrong here

Sqitch is actually working pretty nicely here! - I can throw stuff together and just deploy (and revert, and deploy...)

gstuder-ona commented 4 years ago

@moshthepitt see: https://github.com/onaio/nifi-groovy-scripts/pull/18

Status update - got the database up (no indexes) - see: https://github.com/onaio/data-solutions/pull/31/files

The naming and fields basically follow: https://github.com/onaio/data-solutions/pull/16/files#

... but! there's a locations table instead of a jurisdictions table as I'm not sure we'll only need jurisdictions. Right now only jurisdictions are loaded anyway, so if that's annoying feel free to CREATE VIEW jurisdictions AS SELECT * FROM locations;

The flows are here: https://nifi-production.ona.io/nifi/?processGroupId=0c0b3f24-0806-1f61-e219-2c5e2d7d1131&componentIds=82c1378e-bedd-1fa3-0c4d-bd3ab639982b

Known issues: Turned the polling way down to 30 mins as I don't want to kill the nifi-production server. Tasks aren't all loaded yet, they're set to 1000-per-5min but you can ramp that up if needed. Clients may have issues loading in-order from db tables - hopefully not. No indexes. Definitely will want them.

Also added a fix here for scripts: https://github.com/onaio/nifi-groovy-scripts/pull/18

moshthepitt commented 4 years ago

Nice, thanks @gstuder-ona

Concerning this:

but! there's a locations table instead of a jurisdictions table as I'm not sure we'll only need jurisdictions. Right now only jurisdictions are loaded anyway, so if that's annoying feel free to CREATE VIEW jurisdictions AS SELECT * FROM locations;

So one reasons why we may want jurisdictions and "structures" (I think that there are no other kinds of locations, are there?) in separate tables is that structures and jurisdictions usually end up being treated very differently when preparing reports so it "feels" better to separate them, and may make the reporting have better performance.

Given that, what did you have in mind with this change?

gstuder-ona commented 4 years ago

@moshthepitt re: locations vs jurisdictions - I'm not opposed, it's really just me trying to keep down the level of assumptions we make about OpenSRP data. It's more that I want to be consistent - if we agree they really are different and just API weirdness keeps them in the same call then IMO we should be consistent about it and specify "jurisdiction" or "structure" in the flow parameters vs "location" (which is what happens in the reveal processor I modified).

I think they're different enough to always be treated differently, not just for Reveal, do you agree?

What I want to be able to say is "OpenSRP data streams are of events, clients, jurisdictions, structures, plans, and tasks (and events can be optionally broken into sub-types)" and have all the processor parameters consistent with that breakdown.

moshthepitt commented 4 years ago

I think they're different enough to always be treated differently, not just for Reveal, do you agree?

I agree with this.

What I want to be able to say is "OpenSRP data streams are of events, clients, jurisdictions, structures, plans, and tasks (and events can be optionally broken into sub-types)" and have all the processor parameters consistent with that breakdown.

We are 100% on the same page here

opensrp / opensrp-client-reveal

Configure stage OpenSRP Reveal-SZ connector on nifi.ona.io #726