ihsn / nada

National Data Archive (NADA) is an open source data cataloging system that serves as a portal for researchers to browse, search, compare, apply for access, and download relevant census or survey information. It was originally developed to support the establishment of national survey data archives.
http://nada.ihsn.org
MIT License
38 stars 10 forks source link

Missing data display in 5.2 #34

Closed Alhrath closed 2 years ago

Alhrath commented 2 years ago

Hello! We are upgrading from 5.05 to 5.2 and we noticed that when browsing the data in the new version, the result of the surveys are not displayed anymore. 5.0.5 : image 5.2 : image Is this behaviour intended ? Are these data availible elsewhere ? I didn't found anything about that in the changelog. Thanks!

mah0001 commented 2 years ago

It does not look right. Can you try to reload the DDI to see if that solves the issue?

Alhrath commented 2 years ago

Hi! Thanks for your response and sorry for the delay I was busy elsewhere.

I started from scratch again, doing all the copy from the prod server and applying the upgrade again. (By the way I had two errors following the process indicated here : "survey_locations" table seems to already exist (I think I was in 5.05 before upgrade), and I think that "DROP TABLE variables;" in option 2 is not necessary since the table is renamed on the line above.)

Anyway, the bug was still there. I tried a DDI refresh but it didn't solve it.

So I digged a little bit more in the code. It seems that in application/views/metadata_templates/fields/field_var_category.php line 25, $item['stats'] is not an array (but is indeed set as an integer). Therefore, $stats_col_wgtd is not set, $sum_cases_wgtd neither, and at line 98 the stats are skipped.

mah0001 commented 2 years ago

Can you try loading this test DDI file and see if you get the same results? https://github.com/ihsn/ddi-examples/tree/main/demo-popstan-2006

This is what it looks like on my machine:

image

Also could you try to paste the JSON for the variable to see if the values are correctly being imported from DDI. To get the variable JSON, you can use the API endpoint http://your-catalog-url/index.php/api/catalog/study-numeric-id/variable/variable-id/?id_format=id

For example: Web: http://nada-demo.ihsn.org/index.php/catalog/98/variable/F7/V1023?name=BR2 JSON (API): https://nada-demo.ihsn.org/index.php/api/catalog/98/variable/V1023?id_format=id

The variable metadata should look like this:

{
    "variable": {
        "uid": "179728",
        "sid": "322",
        "fid": "F5",
        "vid": "V422",
        "name": "b01",
        "labl": "Children living elswhere",
        "qstn": null,
        "catgry": "Yes No",
        "metadata": {
            "file_id": "F5",
            "vid": "V422",
            "name": "b01",
            "var_intrvl": "contin",
            "var_dcml": "0",
            "var_wgt": null,
            "var_is_wgt": null,
            "loc_start_pos": null,
            "loc_end_pos": null,
            "loc_width": "1",
            "loc_rec_seg_no": null,
            "labl": "Children living elswhere",
            "var_imputation": null,
            "var_security": null,
            "var_resp_unit": null,
            "var_analysis_unit": null,
            "var_qstn_preqtxt": null,
            "var_qstn_qstnlit": null,
            "var_qstn_postqtxt": null,
            "var_qstn_ivuinstr": null,
            "var_universe": null,
            "var_universe_clusion": null,
            "var_sumstat": [{
                "value": "511",
                "type": "vald",
                "wgtd": null
            }, {
                "value": "2005",
                "type": "invd",
                "wgtd": null
            }, {
                "value": "1",
                "type": "min",
                "wgtd": null
            }, {
                "value": "2",
                "type": "max",
                "wgtd": null
            }],
            "var_txt": null,
            "var_catgry": [{
                "value": "1",
                "labl": "Yes",
                "is_missing": null,
                "stats": [{
                    "value": "40",
                    "type": "freq",
                    "wgtd": null
                }]
            }, {
                "value": "2",
                "labl": "No",
                "is_missing": null,
                "stats": [{
                    "value": "471",
                    "type": "freq",
                    "wgtd": null
                }]
            }],
            "var_codinstr": null,
            "var_concept": [],
            "var_format": {
                "type": "numeric",
                "schema": "other",
                "category": null,
                "name": null
            },
            "var_notes": null,
            "var_val_range": {
                "min": "1",
                "max": "2"
            },
            "fid": "F5"
        },
        "keywords": null
    }
}
Alhrath commented 2 years ago

I imported the test DDI and the graph of the variables do appear. It's probably a encoding problem in our DDI then ...

Here is the JSON of a variable that do appear in 5.05 and do not in 5.2 : r02_V16.json.txt And if it can help, the DDI of the corresponding study : ch.sidos.ddi.339.7245_de.xml.txt

I remember we already had an encoding problem few years ago and when helping us you asked where our DDI where coming from: before nada, we where using Nesstar.

Thanks for the help.

mah0001 commented 2 years ago

Your DDI file does not have any frequencies included. The variable element should have the catgry element as below:

<catgry>
        <catValu>
          -2
        </catValu>
        <labl>
          Keine Angabe
        </labl>
        <catStat type="freq">
          146
        </catStat>
      </catgry>
      <catgry>
        <catValu>
          1
        </catValu>
        <catStat type="freq">
          232
        </catStat>
      </catgry>
      <catgry>
        <catValu>
          2
        </catValu>
        <catStat type="freq">
          340
        </catStat>
      </catgry>

I noticed that if I download the same DDI from your pre NADA 5.2 catalog, it contains the frequencies and works/displays correctly. Here is the downloaded DDI: ch.sidos.ddi.339.7245_de.xml.txt

Alhrath commented 2 years ago

Yep I checked you are right, it seems I can download the DDI files from the 5.0.5 version, and reupload them in the 5.2 version which make the version 5.2 correct. It would be some work to do it for the around 1000 survey by hand, but by chance we should have the original xml files stored somewhere so we could make a bulk import of all the original data and that should not take too long.

But before getting back on this workaround, I would like to give a try to understand what's going on there ... I don't understand why on the 5.0.5 version I have a correct DDI and somehow the data get lost in the transition to the 5.2 :p It's probably mainly for my sanity, a little to be sure not to lose data in the process and also in the hope that I can help somebody that would have the same problem someday and would not have the data stored in a handy way ...

But one thing I don't get and you could maybe help me : where is stored this information about "catStat" frequency ? I couldn't find it anywhere, but it sure sould be stored somewhere right ?

Bonus question : in the eventuality of a reupload of the data from 5.0.5, is there a way to export all the xml files at once somehow ?

mah0001 commented 2 years ago

Before I suggest anything, it would be good to have a call to see what happened. Please message me at mah0001@gmail.com and I will set up a meeting.

Alhrath commented 2 years ago

For anyone who would have similar issue, with the help of @mah0001 I managed to solve this.

It's unclear why, but the variable metadata stored in the DB were in a wring format. All the arrays were stored as

    [var_catgry] => Array
        (
            [0] => Array
                (
                    [value] => 1
                    [labl] => Deutsch
                    [stats] => 613
                    [type] => freq
                )

                [...]
        )

Changing them to

    [var_catgry] => Array
        (
            [0] => Array
                (
                    [value] => 1
                    [labl] => Deutsch
                    [is_missing] =>
                    [stats] => Array
                        (
                            [0] => Array
                                (
                                    [value] => 613
                                    [type] => freq
                                    [wgtd] =>
                                )

                        )

                )

         [...]

        )

Seems to have solved the issue.