ualbertalib / dataverse

A data repository framework to share and publish research data.
http://dataverse.org
Other
0 stars 0 forks source link

Check which notes fields are supported in dv4 #49

Closed johnhuck closed 7 years ago

johnhuck commented 7 years ago

Issue ua2

dv3 DDI output includes values in the following notes elements:

//codeBook/docDscr/citation/verStmt/notes (See also #48) --> In UAL dv3, small number of user supplied values; but majority are auto generated by Dataverse. We think this data is not migrated because of the new way that dv handles versions. Because: there is no obvious place to capture this data; Dataverse 4 allows users to compare specific differences between versions; and the number of studies this affects is small: John recommends that we accept this loss, but mention it when we communicate with users. No further action required

//codeBook/stdyDscr/citation/notes --> In UAL dv3, all values appear to be UNF codes, which I presume to be automatically generated by dataverse for internal purposes. No further action required

//codeBook/stdyDscr/dataAccs/notes --> In UAL dv3, majority of the values are "UAL Dataverse Network Data Use Terms" statements. Howver, a number are user supplied terms of use statements. We verify whether these statements are coming over and should try to preserve them, if necessary. In the public sets these include datasets from VOICE-Canada (e.g., 10440), the Shanghai Spoken Corpus (e.g., 10493) and doi:10.7939/DVN/10866. --> John Update: Spot checking these examples, the user-supplied terms of use are visible in the public GUI of dv4. So we may conclude that the data is being migrated, but they are absent from dv4 ddi output due to XML output configuration. No further action required Fix dv4 output mapping later

//codeBook/fileDscr/notes --> In UAL dv3 public studies, half of these are UNF values (60), presumably system generated. The user-supplied values (60) are nondescript (Vast majority are the word: "Data"). It's unclear why the user-supplied values didn't migrate. We could try to investigate, but the information value is quite low. --> Upon further investigation, it appears that notes with the following attribute value //codeBook/fileDscr/notes[@type='vdc:category'] are user supplied headings that dv3 allows users to specify to organize files under on the Data & Analysis page. This is equivalent to the //codeBook/otherMat/notes[@type='vdc:category'] discussed below. It appears that tabular data files are recorded under /fileDscr and other files under /otherMat. --> These values are captured in "tags," but the tags don't appear visible to a user, so they can't play their role in organizing the files. --> For the purposes of a quick migration, John recommends no further action for this element, on the basis that no data is lost. No further action required

//codeBook/dataDscr/var/notes --> In UAL dv3, these are all UNF values, presumably system generated for internal purposes. And the number of element instances matches in the dv4 DDI output. Therefore this data is migrating. No further action required

//codeBook/otherMat/notes --> In UAL dv3, Various kinds of user supplied values for otherMat notes: often a keyword like "images" to characterize the material. But there is a discrepancy between the number of instances of the element found in the public studies dv3 DDI output (2911) and the dv4 output (2716). We should try to figure out why. John will dig further. -- > John has determined that the dv3 and dv4 output for this element are not the same values. The dv4 output seems to be automatic file characterizations, while dv3 output seems to be user supplied values. Comparison in this table --> Further observations: Note elements with attribute type="LOCKSS:CRAWLING" and value restricted control file level access (public or restricted). Spot checking some examples, the restricted access controls remain in place in dv4. --> Note elements with attribute type="vdc:category" are in fact user-supplied headings that dv3 allowed users to specify to organize files under on the Data & Analysis page. These headings are gone in dv4, but the values are preserved as "tags" on the files. However, these tags don't appear visible to a user, so they can't play their role in organizing the files. --> For the purposes of a quick migration, John recommends no further action for this element, on the basis that no data is lost. However, users should be made aware of the functional loss of the display headers (via their conversion to tags) no further action required

dv4 DDI output appears to include values only in the last three elements, but it is not clear whether the values in those elements represent migrated metadata values, or system-generated (dv4) values.

Which of these notes elements have corresponding dv4 db fields? And which of the dv4 db fields hold values originating from the DDI ingest process?

johnhuck commented 7 years ago

Example of //codeBook/fileDscr/notes 10011_dv3_export_ddi.xml 10011_dv4_export_ddi.xml

johnhuck commented 7 years ago

I have resolved all the outstanding questions and no further action is required.

piyapongch commented 7 years ago

//codeBook/fileDscr/notes //codeBook/otherMat/notes It is possible to add notes from tags to dv4 ddi. This will need more investigation on dv4 ddi exporter.

//codeBook/stdyDscr/dataAccs/notes The terms of use also could be added to dv4 ddi exporter.

piyapongch commented 7 years ago

The Study version notes have been dropped in dv4.