medic / cht-core

The CHT Core Framework makes it faster to build responsive, offline-first digital health apps that equip health workers to provide better care in their communities. It is a central resource of the Community Health Toolkit.
https://communityhealthtoolkit.org
GNU Affero General Public License v3.0
466 stars 217 forks source link

Add support for exporting reports in csv format #3594

Closed eljhkrr closed 6 years ago

eljhkrr commented 7 years ago

@eljhkrr I'm (@sglangevin) updating this first comment to summarize the discussion below so that whichever dev takes this on doesn't have to review the whole thread.

Requirements for completing this issue:

  1. Support export of reports in csv format. We agreed that it's ok to lose the tabs that we currently have in the xml format.
  2. Support streaming of the download so that we can download more than 1000 records at one time. Ideally this should be unlimited.
  3. Support export of both JSON forms and xforms.
SCdF commented 7 years ago

Hi @eljhkrr it sounds like you're looking for some kind of feature?

Can you please provide more information about what you're looking for?

eljhkrr commented 7 years ago

Hi @SCdF sorry for being light on details.

Some users are having trouble opening data exports from the webapp because they're only downloadable as xml files. Excel can open xml files, but when Excel isn't one of the obvious options in the 'open with' menu, some users aren't able to open the file. Allowing users to export data in formats which are easier to handle like xls or csv would improve usability.

SCdF commented 7 years ago

@sglangevin do you already have any work around this kind of thing brewing?

sglangevin commented 7 years ago

The workaround is to open the xml using Excel, which should be the default program on most Windows machines. @eljhkrr you could walk users through setting Excel as the default program for xml files while we get time to add a feature for exporting to csv or xls. Or we can do the exports for them and convert them to xls and share if that is necessary. I was planning to schedule this feature for 2.14 - it came out clearly from my recent review of all medic-projects issues (will be sharing a report about that by the end of the day tomorrow).

garethbowen commented 7 years ago

Related #929

garethbowen commented 7 years ago

One issue with using CSV is the current export has tabs for each form which isn't supported by the CSV format.

I recommend we change the download from xml to xls (or xlsx) which should just work.

ngamita commented 7 years ago

I've linked to a ticket where the partner Strongminds Uganda wants export feature to work as before i.e. download to Excel file. They were of the thought it still works that way from the config tab only. cc @Maigua on this. I also do recommend that we move the download from xml to xls as we shall be seeing more of these requests.

sglangevin commented 7 years ago

My thinking was that .csv is more compatible with other applications if our users are importing into another system. It sounds like that isn't the primary use case, so I'm fine with .xls. I'm updating the issue title to reflect that decision.

This is already scheduled for v2.14 so it's in the pipeline.

ngamita commented 7 years ago

@sglangevin sorry for the confusion. I do agree with either csv or xls. Actually csv would be used further by M & E departments like what Strongminds is doing and easily readable by XLS spreadsheet tools too.

garethbowen commented 7 years ago

@sglangevin @ngamita CSV is my preferred approach too. Are you ok with losing the tabs ("sheets" in Excel terminology) from the report?

sglangevin commented 7 years ago

@garethbowen yeah I think that's fine. Is it a lot of extra work to support both, as we did in 0.4?

garethbowen commented 7 years ago

@sglangevin I don't think we ever supported "xls" but we could go back to supporting "csv" and "xml" without too much trouble. It's extra code and an extra option in the UI which sounds like nobody wants. Have you got a solid use case for the "xml" option?

sglangevin commented 7 years ago

Ah - I saw Excel and assumed xls but it was actually xml. We don't have a good use case for xml, so let's go with csv and lose the tabs.

ngamita commented 7 years ago

+1 on csv.

alxndrsn commented 7 years ago

Changing export's format to csv in https://github.com/medic/medic-webapp/blob/master/static/js/services/download-url.js#L8 produces a CSV with the following columns:

Record UUID,Medic ID,Reported date,From,Contact name,Area name,CHW Name,CHW Area Name,District name,Form

It's not clear to me how multiple different forms' data would be stored in the same CSV file. What column names would be expected if the actual data from each form is to be included? Should similar or matching columns in different forms be aligned?

sglangevin commented 7 years ago

@alxndrsn that's a great question. I hadn't given much thought to that yet. I know that partners will need to be able to export all forms - my guess is that this is why we had the tabs in the XML before. All types of forms might be exported for monthly reporting. I also think it's really useful to be able to download a csv that includes all of the form fields from our app. We can't currently do that for xforms, so it would be a nice improvement, though we could also decide to say that must be done from Postgres or Klipfolio.

What about 1 csv per form? That way we wouldn't have to worry about lining up any columns. Not sure how technically difficult that would be and it also might make it harder for a partner to total up all of the form submissions by CHW if they had to recombine into 1 file first.

Another option would be to have a header row then the form data for the first form, followed by a blank line, then the header row for the second form then the data for the second form. This wouldn't be as good for filtering, obviously, but would make it easy to split up the reports.

A final option would be to limit the fields exported when multiple forms are selected to those that we know will overlap across all forms and then include additional columns if just 1 form is downloaded.

@abbyad can you think of any other options? Do you have feedback on any of the ideas here?

alxndrsn commented 7 years ago

What about 1 csv per form? That way we wouldn't have to worry about lining up any columns. Not sure how technically difficult that would be and it also might make it harder for a partner to total up all of the form submissions by CHW if they had to recombine into 1 file first.

This could certainly work. Would users request a specific CSV to download, or would our app bundle multiple CSVs into a single zip for downloading in one go?

Another option would be to have a header row then the form data for the first form, followed by a blank line, then the header row for the second form then the data for the second form. This wouldn't be as good for filtering, obviously, but would make it easy to split up the reports.

I'd be surprised if this actually counts as "CSV", as far as it's a standard, and so probably wouldn't open in excel.


If we're happy with the content of the current exports, it would be much simpler to just generate proper .xlsx files with e.g. node-xlsx.

I don't have a copy of Excel handy, but renaming the current .xml files as .xlsx might also do the trick, and would be an even simpler fix.

garethbowen commented 7 years ago

@alxndrsn I tried renaming and changing the mime type but neither worked as required. I think it's because our xml files aren't valid xlsx.

node-xlsx (and all other libraries I could find) have a couple of issues - they don't do multiple sheets so we're stuck with the csv problem again, and they don't do streaming which means we have to limit the number of rows in the output or risk crashing.

My hope is that with csvs we can easily batch the query to the database, then format and stream the results to the client.

Currently we work out which columns to include based on the json form configuration. We could extend this to enketo as well as long as it doesn't end up too massive.

Column options:

  1. We could have one column for each field that's listed in any of the forms, and most values are blank. eg: if registration has edd and visit has referral-needed then the csv will have both edd and referral-needed and all visits will have a blank edd value.
  2. Zipping would work too, but it's a bit more of a pain for users.
  3. We could rework the UI and API to make form a required field, so you require users to filter by form before they hit Export.
  4. We could export just the common columns if they don't filter by form, or all the columns if they do restrict to a single form.

I think my preference is for option 4 as it makes for an easy UI, a concise doc if you request all reports, and allows for drilling down deeper for integration and so on.

sglangevin commented 7 years ago

@garethbowen I generally like option 4 as well, but I also think that might be confusing for users, who might wonder why they aren't getting all of the columns. I'm wondering if just the common columns would be enough for partners who are doing monthly analysis of all forms received. It might be but I don't know for sure. We have definitely had partners who expected that all of the fields would download for each form.

Another potential option is to only download columns for the data that can be viewed in the Reports detail pane (RHS of Reports tab). Then we would not download a whole bunch of useless fields and it would be configurable because we would ignore any report field that had appearance: hidden.

@ngamita can you weigh in here based on what you know about partners who are using Export for monthly analysis?

sglangevin commented 7 years ago

Wanted to add that we need to support xform exports as well, so whatever solution we come up with should allow us to export fields from both JSON forms and xforms. I've added this to the top comment in this thread as well.

garethbowen commented 7 years ago

For the performance aspect of this, going to CSV also means we can easily implement a fully streaming solution so the number of rows can be unlimited without crashing API. For example...

  1. Fetch the first 1000 rows
  2. Respond with the header and the first 1000 rows
  3. Fetch and respond with the next 1000 until there are none left
  4. End the response

I think it would be ok to remove the other formats and only support csv so everyone uses this performant and unlimited protocol. Alternatively, we could deprecate it and remove it in 3.0.0 to clean up the code.

SCdF commented 7 years ago

Reading through this in detail, I don't think there has been any conclusion as yet. The last word is that @sglangevin likes @garethbowen's idea about exporting only the common columns, but it worried that it may not provide value based on how people use them. @ngamita was tagged in but hasn't responded.

I don't think we can move forward unless we know that the solution will actually work for partners, since that's the key thing we're trying to achieve.

SCdF commented 7 years ago

Blocked on @ngamita responding

ngamita commented 7 years ago

I do agree and support going with .csv. In addition, partners on a daily basis view reports in the Reports detail pane and do expect to see that data when doing further analysis. We don't need to download any fields that don't appear in the reports sections. I hope this helps in reducing what needs to be downloaded. cc @SCdF @garethbowen

ngamita commented 7 years ago

The partners who pushed for this feature are OK with downloading each form type individually as it was before with v0.4. Adding PM cc @Maigua It's almost the same thing we do when we share the Postgres dumps with them . It's per individual form and they're ok with it. cc @SCdF

ngamita commented 7 years ago

I don't see any value in downloading multiple form types at the same time, to have them only return metadata, unless i speak to the PMs - i might be missing something. I would want us to give the partners what they asked for in downloading the same form types. cc @Maigua @SCdF

Maigua commented 7 years ago

@ngamita the request from Strongminds was to have unlimited rows i.e go above the set limit of 1000 per export. Secondly, it is much easier for them to have all forms combined in one download as opposed to having individual forms per download. I also suggest that we only have the fields appearing in the reports as opposed to having information that will be 'useless' in the downloads.

SCdF commented 7 years ago

I also suggest that we only have the fields appearing in the reports as opposed to having information that will be 'useless' in the downloads.

@Maigua this is helpful feedback, and we can definitely look at doing what @sglangevin talks about above, with only exporting fields that are viewable in the report detail view.

However, the core question I think we need to get straight before development is: is it acceptable / useful for csv exports that contain multiple reports, to only export data that is shared between those reports.

For example, let's say you have two forms, and you export all of one form:

patient_id type some_data some_more_data
abc123 formA cats 42
abc321 formA dogs 12

And all of the second form:

patient_id type different_data some_more_different_data
bca123 formB 12.2 true
abc456 formB 7.4 false

NB: imagine way more common data here, such as reported date, the CHW who reported it etc.

The problem arises if you select multiple forms. Currently the XML we generate supports multiple "sheets", but we have to limit it to 1000 rows for performance, and it's not obvious for users how you use the XML (hint: open it in excel).

So how do we represent the above in just one CSV?

The currently proposed solution is only export columns that are the same. However, Sharon is concerned that this might not have enough useful data to do monthly reporting on. Using our examples above (remember there would be way more common data in a real example) it would look like this:

patient_id type
abc123 formA
abc321 formA
bca123 formB
abc456 formB

Another option would be to combine them, but have empty cells where it doesn't apply. However, this could be confusing from a data perspective:

patient_id type some_data some_more_data different_data some_more_different_data
abc123 formA cats 42
abc321 formA dogs 12
bca123 formB 12.2 true
abc456 formB 7.4 false

Yet another option is to generate multiple CSVs (like the first two examples) and deliver them in a zip file. However, this may be confusing for users as well, because now it's not one-click to open the CSV, first you have to extract the zip file.

Maigua commented 7 years ago

@SCdF my suggestion would be (with all factors considered from a partner's view) we combine all the data in one as per the last diagram. This I see will be much easier to filter out and analyse the data the partner requires. I am not sure what confusion it will have as you highlighted.

The other close option would be to have the multiple CSVs in a zipped folder but if you asked me, it will be tasking to the partner in a case where they have multiple forms and lots of data per form. I am sure most of our partners have the know-how of extracting zipped folder... Let me cc @christinewere @PhilipNgari since I think they had the same challenges and see their suggestions.

sglangevin commented 7 years ago

I think if we are going to allow users to download multiple form types, we should include all fields from all forms. Perhaps we can have a naming convention for the columns to help differentiate which columns are for which form? Something like [formname][field_name] just to make it very explicit.

I do like the idea of a zipped folder with multiple CSVs but then it would be a pain to put all of those records together into one file to do analysis of overall CHW activity.

However, I'd also like to know what sorts of analysis partners are doing when they have a dataset that contains all forms. If it's mostly high-level counts, then perhaps exporting only the overlapping fields is sufficient for what they need to do post-export.

@Maigua @christinewere @PhilipNgari @ngamita can you give us some more user stories that explains the types of analysis partners are expecting to be able to do once they have exported a month's worth of forms (all form types)?

ngamita commented 6 years ago

For partner strongminds they do perform their own counts, groups per days/weeks etc and compare or verify with what the CHWs have noted on paper. They have an M & E department with data guys who are taking on these csvs files into SPSS for further in house analysis. @Maigua we can find out more what this in house analysis entails.

abbyad commented 6 years ago

@SCdF, just noting here that the XML Export from Config>Export>Reports does not work in Excel. Reporting it here as suggested by @sglangevin since it may be relevant to this issue.

sglangevin commented 6 years ago

@SCdF what is needed to unblock this issue? It's not clear from the conversation here.

sglangevin commented 6 years ago

Ok, I've re-read this thread and it sounds like we need to decide whether to include only the common columns or to include all columns. I think the most flexible thing to do would be to include all columns. How much more effort would it be to include all columns instead of only the common columns?

garethbowen commented 6 years ago

@sglangevin It's probably about the same amount of effort either way. Let's include all columns.

SCdF commented 6 years ago

While I'm here, I'm wondering if we want to revisit the default reports fields that always get returned. Currently that's:

          '_id',
          'patient_id',
          'reported_date',
          'from',
          'contact.name',
          'contact.parent.name',
          'contact.parent.parent.name',
          'contact.parent.parent.parent.name'

Are we happy with those? Do we want to remove any? @sglangevin ?

sglangevin commented 6 years ago

I was considering whether we should remove contact.parent.parent.parent.name because it will always be blank for mobile app projects but is useful for SMS projects so I think it should stay. Should we consider adding the patient's name to this list? Also, patient_id only seems to pull the Medic ID, not the patient's UUID in cases where the Medic ID doesn't exist. We could consider falling back to the UUID if the Medic ID isn't there, though I suppose you wouldn't want a mixture of Medic ID's and UUIDs, so that could get tricky.

SCdF commented 6 years ago

Also, patient_id only seems to pull the Medic ID, not the patient's UUID in cases where the Medic ID doesn't exist. We could consider falling back to the UUID if the Medic ID isn't there

AFAICT the original implementation doesn't have any logic in there in regards to mapping. If possible I'd like to keep it this way: Ideally this is just the simplest expression of converting our data from reasonably structured JSON to a flat CSV.

SCdF commented 6 years ago

NB: date filters don't work in API:

Deprecation warning: value provided is not in a recognized RFC2822 or ISO format. moment construction falls back to js Date(), which is not reliable across all browsers and versions. Non RFC2822/ISO date formats are discouraged and will be removed in an upcoming major release. Please refer to http://momentjs.com/guides/#/warnings/js-date/ for more info.
Arguments:
[0] _isAMomentObject: true, _isUTC: false, _useUTC: false, _l: undefined, _i: 1517529599999, _f: undefined, _strict: undefined, _locale: [object Object]
Error
    at Function.createFromInputFallback (/Users/scdf/Code/Medic/medic-webapp/shared-libs/search/node_modules/moment/moment.js:324:94)
    at configFromString (/Users/scdf/Code/Medic/medic-webapp/shared-libs/search/node_modules/moment/moment.js:2363:11)
    at configFromInput (/Users/scdf/Code/Medic/medic-webapp/shared-libs/search/node_modules/moment/moment.js:2589:9)
    at prepareConfig (/Users/scdf/Code/Medic/medic-webapp/shared-libs/search/node_modules/moment/moment.js:2572:9)
    at createFromConfig (/Users/scdf/Code/Medic/medic-webapp/shared-libs/search/node_modules/moment/moment.js:2539:40)
    at createLocalOrUTC (/Users/scdf/Code/Medic/medic-webapp/shared-libs/search/node_modules/moment/moment.js:2626:12)
    at createLocal (/Users/scdf/Code/Medic/medic-webapp/shared-libs/search/node_modules/moment/moment.js:2630:12)
    at hooks (/Users/scdf/Code/Medic/medic-webapp/shared-libs/search/node_modules/moment/moment.js:16:25)
    at reportedDateRequest (/Users/scdf/Code/Medic/medic-webapp/shared-libs/search/src/generate-search-requests.js:54:12)
    at reports (/Users/scdf/Code/Medic/medic-webapp/shared-libs/search/src/generate-search-requests.js:163:7)
garethbowen commented 6 years ago

Back to you!

SCdF commented 6 years ago

@garethbowen let's discuss some points on Monday

SCdF commented 6 years ago

Merged to master

SCdF commented 6 years ago

Whoops, I still have to look into and fix the bug in API where date ranges are not honoured.

SCdF commented 6 years ago

Twas but a small thing. Fixed.

sglangevin commented 6 years ago

Did some AT just now and found that I get an error when I try to filter and then Export. On alpha.dev, I went to the Reports tab, selected 2 form types, and then clicked Export. I immediately get a Failed - Server problem error and nothing is downloaded. I don't see any obvious errors in the console. Exporting a filtered set of reports was working previously, so this is a regression.

Secondly, when I click the Export button without a filter in place, it's not clear to the user that something is happening between the time I hit Export and when the file starts downloading. Can we try graying out the Export button so that users don't press it multiple times? Even better would be to have a notification somewhere, but that is probably introducing more complexity than I'd like to at this point.

SCdF commented 6 years ago

Secondly, when I click the Export button without a filter in place, it's not clear to the user that something is happening between the time I hit Export and when the file starts downloading

Yeah, I noticed that. It existed before this change, and I hoped that because this implementation should return sooner (because it only needs to calculate the header before sending you some data back) it would be less awkward.

I can look at some kind of indicator to make it clearer.

SCdF commented 6 years ago

On alpha.dev, I went to the Reports tab, selected 2 form types, and then clicked Export.

This is due to how we've configured nginx, or inherent limits in nginx, which is why we didn't see it locally! There is a URL length limit, and we're going over it. Luckily this is in large part because by default we send exactly what we send to the search service, which has a bunch of crap we don't need, so I've removed that stuff from the export link. Links should now be smaller and work more reliably.

In terms of having loading look nice I'm trying to get it looking nice but… CSS is not my strong suit. @garethbowen let's pair on this!

garethbowen commented 6 years ago

I think the main problem we're going to have is due to the way we download exports we won't know when the download has started or is complete, so if we showed a spinner or disabled the button it would never get unset. We basically rely on the browser UI to communicate the download progress. @sglangevin Is the browser UI insufficient? What do you actually see?

I don't think there's any other way to handle the download without storing everything in JS thereby breaking streaming.

I'm wondering if there's a long pause before the first line of data is sent. We could mitigate this by sending something (maybe an empty line?) immediately after basic validation to get the browser to prompt the user to save the file and give some idea of progress.

sglangevin commented 6 years ago

In my test, it was a long enough delay from when I clicked Export to when the download started in Chrome (downloads bar popped up at the bottom) that I wasn't sure if it was working, so I'd assume this will happen to a lot of users. I'm just looking for a way to indicate to the user that the Export button was pressed and the download is going to start. If we can get the browser to communicate the download progress faster, that would also be sufficient. Or we could have a toast or other message that pops up when the user clicks Export that says "Export requested" or something and disappears after a few seconds, but that might not be necessary. Let's try speeding up the browser communicating the download faster and see how that goes.

garethbowen commented 6 years ago

We could definitely improve the feedback on the buttons by having them appear to depress or something as you click on them but it's probably not worth it as we're moving to material style FABs soon which do feedback well.

I'm sure forcing flushing of a \n would work but it would leave an empty line in the top of the CSV. We may be able to get away with flushing the headers only which would have the same effect without affecting the final file. @SCdF I'll leave it to you to investigate whether this approach works.