Closed AmandaPDSC closed 11 years ago
The data type is on the individual files in the item. Since most items have more than one file associated with them, I don't think it's possible to give them a data type field. Or am I misunderstanding? What is the controlled vocab that you're referring to?
This is a separate field at item level in the old catalogue - the vocab includes historical_text, instrumental_music, language_description, lexicon etc.
@LindaBarwick I was under the impression that we wanted to drop that field. That's what I got from early discussions. Can you clarify please?
I don't recall any previous discussion of this. Obviously the DCMI datatype (sound, moving image etc) does belong at file level. There was some history behind why the DCMI datatype was mixed up with the linguistic datatype @nthieberger may remember.
We mentioned this in a message of Jan 12, it is 'linguistic-type' (http://www.language-archives.org/OLAC/1.1/olac-linguistic-type.xsd)
---------- Forwarded message ----------
Hi Silvia and John,
This is info for when you gt to the OLAC harvest. OLAC has a limited metadata set that we map our current metadata to using the attached php script. The following are OLAC terms, and we also can use DCMI terms as per the standard: http://www.language-archives.org/OLAC/metadata.html
Their terms, below, are not all useful in our catalog, so we have used those marked with an asterisk. Those marked with a % are mapped to from the term following it in the list below (so, 'lexicography' is one of our 'olac-linguistic-field' terms because it is mapped from 'lexicon' in the data type selection.)
I'm sure we'll need to discuss how this all works.
Thanks,
Nick
So, how do we resolve this?
Right now we have the field "Discourse Type" on each item, which can take these values:
+----+-----------------------+ | id | name | +----+-----------------------+ | 1 | drama | | 2 | formulaic_discourse | | 3 | interactive_discourse | | 4 | language_play | | 6 | narrative | | 5 | oratory | | 7 | procedural_discourse | | 8 | report | | 9 | singing | | 10 | unintelligible_speech | +----+-----------------------+
That covers what Nick mentions, IMO.
Also, we have roles on each contributor for items, which can take on these values:
+----+---------------+ | id | name | +----+---------------+ | 1 | author | | 2 | compiler | | 3 | consultant | | 4 | data_inputter | | 5 | depositor | | 6 | editor | | 7 | interviewer | | 8 | participant | | 9 | performer | | 10 | photographer | | 11 | recorder | | 12 | researcher | | 13 | speaker | | 14 | translator | | 15 | singer | +----+---------------+
So, I don't know what is missing.
I did just now notice that the OLAC feed is incomplete, so will have to fix that. I'll open a separate bug for that.
Silvia.
On Fri, Sep 21, 2012 at 8:42 AM, Linda Barwick notifications@github.comwrote:
I don't recall any previous discussion of this. Obviously the DCMI datatype (sound, moving image etc) does belong at file level. There was some history behind why the DCMI datatype was mixed up with the linguistic datatype @nthieberger may remember.
At file level we have the data type as a mime type, e.g. image/jpeg, video/mpeg etc. Is this where we should add an additional field that would be provided by hand? All of the other fields are currently imported from the files themselves.
Just checking the old system and I can see another field (as Amanda says).
Do we want to restrict the data types to just what Nick lists: language_description lexicon *primary_text
Or do we want to import all the values from the old system: +---------+---------------------------+ | type_id | type_name | +---------+---------------------------+ | 1 | Historical Reconstruction | | 2 | historical_text | | 3 | instrumental_music | | 4 | language_description | | 5 | lexicon | | 6 | photo | | 7 | primary_text | | 8 | song | | 9 | Typological Analysis | | 10 | Sound | | 11 | Movingimage | +---------+---------------------------+
We can add that to the item.
Given that we have records that will use any of the values in that list, we should import all the values from the previous system
I think we actually need two extra fields. One for OLAC lingustic data type
4 | language_description | | 5 | lexicon | | 7 | primary_text |
another one for the OLAC Linguistic subject vocabulary %historical_linguistics (mapped from data type = 'historical reconstruction') | 1 | Historical Reconstruction | | 2 | historical_text |
% language_documentation (all catalog items are currently assigned this term automatically)
% lexicography (data type = lexicon) | 5 | lexicon |
we also need %typology | 9 | Typological Analysis |
In other words the contents of that field get dealt with as follows:
| 1 | Historical Reconstruction | > OLAC Linguistic subject: historical_linguistics | 2 | historical_text | > OLAC Linguistic subject: historical_linguistics | 3 | instrumental_music | IGNORE [there is no OLAC field to map this to, it is a musicological subject field] | 4 | language_description | > OLAC linguistic_type: language_description, also > OLAC Linguistic subject: language_documentation | 5 | lexicon | > OLAC linguistic_type: lexicon, also > OLAC Linguistic subject: lexicography | 6 | photo | IGNORE - belongs at file level | 7 | primary_text | > OLAC linguistic_type: primary_text | 8 | song | IGNORE [this is covered in a separate field having the OLAC discourse type singing] | 9 | Typological Analysis | > OLAC Linguistic subject: typology | 10 | Sound | IGNORE - belongs at file level | 11 | Movingimage | IGNORE - belongs at file level
@nthieberger @AmandaPDSC does this make sense?
Regarding
| 8 | song | IGNORE [this is covered in a separate field having the OLAC discourse type "singing"]
I just tried to update in old catalog so that all 540 items with "song" in their data type field would also have the discourse type "singing" but I discovered that the discourse type field is missing in the "update items" tab of the old catalog.
@silviapfeiffer how do you think we could deal with this? temporarily leave "song" as an OLAC linguistic data type for import, then after going live do a bulk update to match all these items with discourse type "singing", then later get you to delete "song" from the OLAC linguistic data type table? or else prepare a spreadsheet for you to import with the item ID | discourse type | data type columns? or else edit the update area of the current catalog to make the discourse type field available for bulk updates there?
Yes, this looks good. I'm sorry that I missed the fact of this all being left out until this point of the development
I'll try and fix this by next week, so we can actually move over. This is definitely a blocker.
@LindaBarwick I can create the "signing"/"song" fix upon import.
ok please let me know if you need more from me.
@LindaBarwick Can we just continue to have all these values in one table and one field in the interface? I wouldn't want to include OLAC knowledge into the user interface. The mapping to OLAC fields will only be done on the backend when the OLAC feed is created (just as it happens in the old system).
Also, can I rename the table to Data Category? I don't like "Data Type" - it's too generic and frequently mixed up with the data type of files (image/video/audio).
One belongs in CD:subject and one in DC:type and I think ExSite9 may separate them.
I think we need to hear from @jangari and @nthieberger on this one
It's ok that they go into different places in the OLAC feed, and even come in through different fields in the ExSite9 feed. But since there are only a small number of values, it seems to be overkill to do more than what the old system did.
Also, we have a problem if we create two tables with the contents that you propose: where do we put the other fields that are not exported to OLAC, in particular instrumental_music, photo, song, sound, MovingImage ?
I have suggested that we just ignore those values.
Linda
Ignore them in nabu completely or just when exporting to OLAC?
They can be ignored in Nabu completely, I think. I've added the 'instrumental music' to the descriptions of all those items, so it's redundant. And the other values (photo, sound, MovingImage) do not belong at item level.
Ah right. That changes things. I'm going to wait for confirmation tomorrow if Nick and Aidan also think that's the way to go. I've right now implemented just a full import of the state of the old data_type table.
Not sure if I follow completely. It is fine to have the elements listed above in one drop down with multiple possible selections and to sort out export differences later. We do need 'photo, sound, MovingImage' at the item level as well, I'm not sure why they would only be at collection level Linda? Maybe we can talk about this on Monday to clarify?
The discussion only refers to item level.
I think Linda is suggesting that we introduce 2 tables as a replacement for the one that used to be data_type.
One called: lingustic data type with the choice of
and a second one called: Linguistic subject vocabulary with the choice of
She has provided a mapping for how to take the existing values to these.
Also, she is suggesting to drop:
since the data type of the file already includes this information.
Finally:
will be part of discourse_type and not needed here.
I have for now just imported the data from the old system as is.
I think we need to be explicit about 'movingimage' as an element as it then can turn up in OLAC export (e.g. in http://www.language-archives.org/item/oai:paradisec.org.au:NT5-StringBand). Are you saying it can be generated because the file type is video and so does not need to be provided in the textual metadata? And the same for 'photo' and for 'sound'? But 'photo' is not predictable (given that a pdf could be a photo for example).
On 23 September 2012 19:54, Silvia Pfeiffer notifications@github.comwrote:
Also, she is suggesting to drop:
- photo
- sound
- Moving Image since the data type of the file already includes this information.
Finally:
- instrumental_music
song will be part of discourse_type and not needed here.
— Reply to this email directly or view it on GitHubhttps://github.com/nabu-catalog/nabu/issues/205#issuecomment-8796956.
Not at collection or item level, at file level
Our items can include files of multiple types e.g sound moving image text XML - so this info properly belongs at file level. Unless you want item level to default to the main data type?
We would need to add another field for this dcmi type
On 23/09/2012, at 8:43 PM, nthieberger notifications@github.com wrote:
I think we need to be explicit about 'movingimage' as an element as it then can turn up in OLAC export (e.g. in http://www.language-archives.org/item/oai:paradisec.org.au:NT5-StringBand). Are you saying it can be generated because the file type is video and so does not need to be provided in the textual metadata? And the same for 'photo' and for 'sound'? But 'photo' is not predictable (given that a pdf could be a photo for example).
On 23 September 2012 19:54, Silvia Pfeiffer notifications@github.comwrote:
Also, she is suggesting to drop:
- photo
- sound
- Moving Image since the data type of the file already includes this information.
Finally:
- instrumental_music
- song will be part of discourse_type and not needed here.
— Reply to this email directly or view it on GitHubhttps://github.com/nabu-catalog/nabu/issues/205#issuecomment-8796956.
— Reply to this email directly or view it on GitHub.
We don't want to have to manually add information at the file level. That requires an additional level of data entry that is right now fully automated. So, let's stick with it at item level.
At this stage - unless we really want to delay the rollout of Nabu further - I suggest we just go with the same approach that we had in the old system. Since I have already implemented that, I'm going to close this now.
If you have a discussion and come to a different conclusion and want this worked on further, please re-open.
I'm planning to push out the latest state later tonight, so you should be able to test tomorrow.
Oops, update will be pushed by this arvo.
Can I be pedantic and suggest that all items in the list should be lower case - at the moment Sound, Movingimage and Historical reconstruction begin uppercase but all others are lower case. Also there is a space in Historical Reconstruction but not in Movingimage and all other two-word items use _ between the words. @nthieberger is there any good reason for this?
You are right, they should be lowercase, and the space has to be filled by an underscore. Unfortunately, even though we are talking about standards, Dublin Core has 'MovingImage' (http://dublincore.org/documents/dcmi-terms/) so I am not sure if we need to keep that or can use 'moving_image'?
I didn't like that either, but that's how I imported it from the old DB. How about making it readable in our interface and then just writing the right values to the XML files and harvesting interfaces? I can fix the import for that.
The datatype field is missing from items in Nabu - it is a repeating field with a controlled vocab that should be included in each item. It could be placed near "discourse type" but is a separate field