wsharba / opendatakit

Automatically exported from code.google.com/p/opendatakit
0 stars 1 forks source link

Better export for statistics tools #495

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I am also severely missing a possibility to export data to statistical 
programmes such as Stata or R including variable labels and value labels, i.e. 
essentially the underlying logic of the questionnaire. Before, we were using 
pencil paper questionnaires and the free (but not open source) CSPro as data 
entry tool. It was possible to directly export to Stata, which was extremely 
handy. Any change in the questionnaire was immediately and correctly reflected 
in the Stata export. As any digital questionnaire includes the information of 
(a) the question, (b) variable name, (c) answer categories in words and (d) 
associated values in numbers. As far as I understood, (a)-(c) would be lost in 
ODK .csv export and only (d) is kept. Is that correct? IMHO, this would mean 
much (error-prone) duplication of work. 

How difficult is it to include (a)-(c) and to develop an export to statistical 
programmes? Has already anyone started to write an export filter to one of 
these statistical programmes?

--Gerry

Original issue reported on code.google.com by yanokwa on 6 Jan 2012 at 12:01

GoogleCodeExporter commented 9 years ago
For us to implement this we would have to go research the proper formats. You 
could help us with this by filing an issue with links to the format 
specification and a couple of examples (can contain fake data) that show the 
original xform used to gather data, the original csv export along with a file 
that shows what the desired output format that combined them. The issue tracker 
allows you to attach files.

-- Waylon

Original comment by yanokwa on 6 Jan 2012 at 12:02

GoogleCodeExporter commented 9 years ago
Thank you very much for opening this issue in order to have an export of data 
to statistical programmes, such as Stata or R, including variable and value 
labels.

I contacted the Stata developers and they were sending me the links to the 
specification of the Stata .dta file format:
* 8/9:   http://www.stata.com/help.cgi?dta_113
* 10/11: http://www.stata.com/help.cgi?dta_114
* 12:    http://www.stata.com/help.cgi?dta

Thanks for looking into it.
--Gerry

Original comment by gerry.tr...@googlemail.com on 8 Jan 2012 at 12:57

GoogleCodeExporter commented 9 years ago
Some versions ago, Stata introduced an improved mechanism to import/export 
Stata data via an XML based mechanism. They call it "Stata dta XML". Since ODK 
makes extensive use of XML, this seems to me the best and easiest method to 
create import/export filters for Stata.

I asked the Stata support whether there is any limitation of the Stata-dta-XML 
versus the binary Stata-dta file format. Here is their reply:
"The Stata XML format shares the same limitations as the Stata DTA format.
There is not much documentation for the Stata XML format because it is really
just the DTA format written out into XML." This means the descriptions as given 
in comment 2 are also helpful for and export to Stata XML.

Original comment by gerry.tr...@googlemail.com on 17 Jan 2012 at 10:06

GoogleCodeExporter commented 9 years ago

Original comment by mitchellsundt@gmail.com on 30 Jan 2012 at 11:59

GoogleCodeExporter commented 9 years ago
Our team is still very interested in having such a feature.

As to Yaw's request in the 2nd comment, I will attach an xform questionnaire 
(questionnaire.xml), filled out instance (instance.xml), exported results 
(result.csv) and a stata compliant xml file that contains value labels and the 
original questions as variable labels (labels.xml). What we are looking for is 
something in the direction of the last one (I am no expert there may be many 
ways to do this more appropriately).

I did not fill out the form in collect, but just wrote the files so there might 
be mistakes.

If this or sth in this direction, will never be an official feature of 
aggregate, our team would consider commissioning a customized version.

Thanks

Olivier

Original comment by o.kal...@gmail.com on 15 Feb 2012 at 4:50

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by yanokwa on 25 May 2012 at 3:46

GoogleCodeExporter commented 9 years ago

Original comment by yanokwa on 26 May 2012 at 1:10

GoogleCodeExporter commented 9 years ago
Hello! This would be a fantastic feature.  I wanted to know if this will be 
implemented.

Original comment by patelm...@gmail.com on 12 Aug 2013 at 10:09

GoogleCodeExporter commented 9 years ago
The core team does not have time to implement this.  

I am unclear whether this should be part of an automated publisher (Aggregate) 
for a specific statistics server (?), or part of the XLSForm tool (i.e., 
XLSForm would generate multiple output files).

If I understand these packages, I believe it would be a change to XLSForm?

If someone commissions the writing of these features, we can fold those changes 
into the main tree.

Original comment by mitchellsundt@gmail.com on 13 Aug 2013 at 12:04

GoogleCodeExporter commented 9 years ago
As I see it, it would be part of Aggregate for a specific statistics server.  
The file that Aggregate would output combines the data & syntax / structure 
into one file.

Original comment by patelm...@gmail.com on 17 Sep 2013 at 4:36