kobotoolbox / kpi

kpi is the server for KoboToolbox. It includes an API for users to access data and manage their forms, question library, sharing settings, create reports, and export data.
https://www.kobotoolbox.org
GNU Affero General Public License v3.0
133 stars 181 forks source link

Missing data and column order issues in XLS Export #1449

Closed maskoamas closed 7 years ago

maskoamas commented 7 years ago

A MIRA for IRMA has been ongoing and on export of the data in XLS most of the records are;

  1. Missing in output i.e. black fields although there are submissions via KPI. On export via Legacy the data is visible but it doesn’t show labels.
  2. The order of the columns in not the same as how the fields are arranged in the form.

the files are @ https://www.dropbox.com/sh/p74d43tnc4ruha2/AAD3yGuMqMfLHGLavGSHYzKMa?dl=0

We need to process this data ASAP as it is crucial to the response teams currently in Haiti.

The account details are;

Username: Irma_mira

Survey link : https://ee.kobotoolbox.org/x/#Ym44

jnm commented 7 years ago

URL for this form is https://kobo.humanitarianresponse.info/forms/#/forms/aVVxUTtT7apKEFSX7p3kxr/landing

jnm commented 7 years ago

The data all list __version__ viF95XZ6p3fFMRPzqkckTo, but no version with that UID exists in the database

jnm commented 7 years ago

Getting the data with remote_pack is arduous but ultimately fruitful.

john@scrappy:/tmp$ virtualenv e
john@scrappy:/tmp$ . e/bin/activate
(e)john@scrappy:/tmp$ git clone https://github.com/kobotoolbox/formpack.git
(e)john@scrappy:/tmp$ cd formpack/
(e)john@scrappy:/tmp/formpack$ git merge origin/remote-pack
(e)john@scrappy:/tmp/formpack$ python setup.py develop
<snip>
ImportError: <module 'setuptools.dist' from '/tmp/e/local/lib/python2.7/site-packages/setuptools/dist.pyc'> has no 'check_specifier' attribute
(e)john@scrappy:/tmp/formpack$ pip install -r requirements.txt -r dev-requirements.txt 
Successfully installed coverage nose tox flake8 funcsigs pluggy virtualenv py pyflakes pep8 mccabe MarkupSafe
(e)john@scrappy:/tmp/formpack$ python setup.py develop
Finished processing dependencies for formpack==1.4
(e)john@scrappy:/tmp/formpack$ mkdir ~/.formpack
(e)john@scrappy:/tmp/formpack/data$ (cat <<EOF
{
  "ocha:irma_mira": {
    "api_url": "https://kobo.humanitarianresponse.info/forms/assets/",
    "token": <redacted>
  }
}
EOF
) > ~/.formpack/accounts.json
(e)john@scrappy:/tmp/formpack$ python pull.py --account ocha:irma_mira aVVxUTtT7apKEFSX7p3kxr
  File "/tmp/formpack/src/formpack/remote_pack.py", line 89, in _query_kcform
    ctx['kc_formid'] = r2[0]['formid']
IndexError: list index out of range
(e)john@scrappy:/tmp/formpack$ git apply <<EOF
diff --git a/src/formpack/remote_pack.py b/src/formpack/remote_pack.py
index 287f6b0..e97d346 100644
--- a/src/formpack/remote_pack.py
+++ b/src/formpack/remote_pack.py
@@ -83,8 +83,10 @@ class RemoteFormPack:
                 'kc_api_url': '{}://{}/api/v1'.format(_deployment.scheme,
                                                       _deployment.netloc),
             }
+            import posixpath
+            kc_id_string = posixpath.split(_deployment_identifier)[1]
             _url = '{}/forms?id_string={}'.format(ctx['kc_api_url'],
-                                                  self.uid)
+                                                  kc_id_string)
             r2 = requests.get(_url, headers=self._headers()).json()
             ctx['kc_formid'] = r2[0]['formid']
             with open(self.path('context.json'), 'w') as ff:
EOF
(e)john@scrappy:/tmp/formpack$ python pull.py --account ocha:irma_mira aVVxUTtT7apKEFSX7p3kxr
  File "/tmp/formpack/src/formpack/remote_pack.py", line 123, in load_version
    raise Exception('Version not found')
(e)john@scrappy:/tmp/formpack$ git apply <<EOF
diff --git a/src/formpack/remote_pack.py b/src/formpack/remote_pack.py
index 287f6b0..e97d346 100644
--- a/src/formpack/remote_pack.py
+++ b/src/formpack/remote_pack.py
@@ -108,6 +110,8 @@ class RemoteFormPack:
                 self.load_version(version_id)

     def load_version(self, version_id):
+        version_id = 'v4zyz8dLHwaSBc7PCfupDe'
+        print('!!! FORCING VERSION TO', version_id)
         _version_path = path.join(self.path('versions'),
                                   '{}.json'.format(version_id)
                                   )
EOF
(e)john@scrappy:/tmp/formpack$ python pull.py --account ocha:irma_mira aVVxUTtT7apKEFSX7p3kxr
  File "/tmp/formpack/src/formpack/remote_pack.py", line 150, in create_pack
    title=self.asset.name, ellipsize_title=False,
TypeError: __init__() got an unexpected keyword argument 'ellipsize_title'
(e)john@scrappy:/tmp/formpack$ git apply <<EOF
diff --git a/src/formpack/remote_pack.py b/src/formpack/remote_pack.py
index 287f6b0..e97d346 100644
--- a/src/formpack/remote_pack.py
+++ b/src/formpack/remote_pack.py
@@ -143,7 +147,7 @@ class RemoteFormPack:
             _v['date_deployed'] = _v.pop('date_deployed', None)
             self.versions.append(_v)
         return FormPack(versions=self.versions, id_string=self.uid,
-                        title=self.asset.name, ellipsize_title=False,
+                        title=self.asset.name
                         )

     def stats(self):
EOF
(e)john@scrappy:/tmp/formpack$ python pull.py --account ocha:irma_mira aVVxUTtT7apKEFSX7p3kxr
!!! FORCING VERSION TO v4zyz8dLHwaSBc7PCfupDe
(e)john@scrappy:/tmp/formpack$ head ~/.formpack/aVVxUTtT7apKEFSX7p3kxr/data.json 
[
  {
    "B_Demographie/B_2Lg==3a": "134", 
    "A_Localisation/A_1Lg==5Lg==_Quartier": "Dilaire", 
    "D_Secal/D_4Lg==5aaa/D_4Lg==5h": "peu", 
    "D_Secal/D_4Lg==1/D_4Lg==1g": "NON", 
    "J_ER/J_activites/J_10Lg==4": "Partiellement", 
    "F_Wash/H_SanteNut/H_8Lg==4/H_8Lg==4a": "NSP", 
    "_tags": [], 
    "I_Prot/I_Vuln/I_9Lg==1b": "famille", 
(e)john@scrappy:/tmp/formpack$
jnm commented 7 years ago

The missing data in the non-legacy export is likely due to https://github.com/kobotoolbox/kpi/issues/1348. Something like A_Localisation/A_1Lg==5Lg==_Quartier in the submission data corresponds to question name A_1.5._Quartier in the group A_Localisation, and formpack is not doing the base64 encoding of to match up the dots with the Lg==s:

john@scrappy:~$ echo -n . | base64
Lg==

The column ordering problem is a separate issue that I haven't investigated yet.

jnm commented 7 years ago

I've worked around this by deploying the hotfix https://github.com/kobotoolbox/formpack/commit/83733de0fb764d9335454aa3217a2bd04d9fcbb0 to OCHA's servers. I'll leave https://github.com/kobotoolbox/kpi/issues/1348 open until a durable fix for KPI-based exports is in place.

@maskoamas, I think that the column order problem is covered by https://github.com/kobotoolbox/formpack/issues/60, but please reopen this report if it's a different problem.