Change default display names of file types and types of materials

DominicBM commented 9 years ago

The filter options for type of material and file type have a lot of clunky language, because we are just using the exact source wordings. Some of these could be clarified/shortened for the general user.

For example, "Textual Records", "Moving Images", "Sound Recordings", and "Architectural and Engineering Drawings" could be just "Textual", "Video", "Sound", "Drawings", and so on.

Similarly, there is no need for file types to display as clunky as "Portable Document File" or "application/zip". Perhaps we could just convert these to display only the file extension, like ZIP, PDF, etc. (with some merged together for clarity, like JPG and JPEG, TIF and TIFF, etc.)

We'd need to discuss exactly what changes to make.

screen shot 2014-12-09 at 2 52 40 pm

DeniseHenderson commented 9 years ago

These are pulling from the authority values from the LCDRG and pulling from the archival metadata in DAS. Unless the catalog can "switch" the terms on the fly for the public-facing system, that we might be stuck with this. Also, we need to be mindful of archival terminology. Moving images does not equate to Video - this has already been a sore subject with people in the tabs. Archival and Engineering Drawings is the not the same as Drawings.

DominicBM commented 9 years ago

The system should be able to be designed to map the DAS fields correctly, while just changing the display names of the fields.

I think it's an issue that archival terminology is not the same as user speak, and also verbose even when it is clear (like "sound recordings" instead of "sound"). I actually think "moving images" is one of the more egregious examples. Normal humans don't use that term; it might have an archival meaning that is more precise than "video", but that it is actually less understandable, not more, to most people. Having said that, some of these are potentially controversial, but many, especially the file formats, probably aren't. We just need to decide on what can be changed.

Perhaps, in the long term, the LCDRG could specify "common name" value for each of these fields, so that display names understandable to the public are codified and part of the DAS schema/export, not just made up by us. CC @WaxCylinderRevival

clingerman commented 9 years ago

I don't think saying these terms are "made up by us" is fair. These are pretty standard terms in the archival community. See, for example, the use of both "moving image materials" and "sound recordings" in DACS (http://files.archivists.org/pubs/DACS2E-2013.pdf).

I'm going to have to disagree with you Dominic and say that we should be using the terms that are both accepted and most common across the archival community.

Not all moving images can be considered videos, so to use the term videos would be a false representation. See the Wikipedia article on moving image formats (http://en.wikipedia.org/wiki/Moving_image_formats). From that article: "Moving image material, based on this, is sometimes roughly divided into 2 groups: the so-called film-based material, where the image of the scene is captured by camera 24 times a second (24 Hz), and the video-based material, where the image is captured 50 or ~60 times a second." We have plenty of moving images that could not correctly be classified as video.

We are the National Archives and so despite our desire to reach out to the larger public, we should still adhere to archival principals in the process.

DominicBM commented 9 years ago

@clingerman, for what it's worth, when I said "made up by us", I meant the "us" as the catalog team. My point was that a compromise could be to have acceptable common names specified by the LCDRG instead of just changing the archival terms in the front-end without subjecting them to review.

We can argue about the term "video", but I'd rather we not get bogged down in that and end up still using abominations like "Portable Document File" in our UI. Can we at least agree to the idea that we should adopt some reasonable display names? That's the more important question than the unintentionally controversial example I gave.

clingerman commented 9 years ago

I know what you meant. Sure, let's abbreviate the file formats (PDF, etc.). But let's leave the types of materials alone. I don't think we are going to agree on common names. For example - if not video, what is a better common name for moving images? I can't think of one.

DominicBM commented 9 years ago

Well, considering that you have to quote a definition before I have any idea what it is supposed to mean, I think that using the two terms in the definition, like "Film/video" is still an improvement. I don't think it really needs to be so difficult.

On Wed, Jan 14, 2015 at 11:53 AM, clingerman notifications@github.com wrote:

I know what you meant. Sure, let's abbreviate the file formats (PDF, etc.). But let's leave the types of materials alone. I don't think we are going to agree on common names. For example - if not video, what is a better common name for moving images? I can't think of one.

— Reply to this email directly or view it on GitHub https://github.com/usnationalarchives/OPAProd/issues/39#issuecomment-69948772 .

clingerman commented 9 years ago

Technically, a flip book would be considered a moving image which is neither film nor video. Example: http://vlp.mpiwg-berlin.mpg.de/essays/data/art31

I don't want to delve into minutiae either, my point is we shouldn't change a term that works and is accepted in the archival community. If people are confused, maybe instead we should have a hover-over or page explaining what each type of material entails.

mereastew commented 9 years ago

I would encourage that we have a productive conversation on both sides and try to understand everyone's point of view without minimizing the other's point of view. Let's work together on this!

There's a tension here that we're hitting and I think it's really important that we examine both sides of the issue - because there are two sides here. One is our wish to be specific in terms of the archival terminology that we use and represent the archival units and the records to the best of our ability. The other is to be as easy as possible for folks who don't have an understanding of the terminology to find what they are looking for online.

So many times before we've been compared to the Library of Congress and folks say they can find what they are looking for there, but they can't on our website. I think it would be good for us to take a look to see what the other institutions are doing online so that our terminology is fitting with what's best practice for institutions like us, on the web.

I don't think this conversation is trivial, so I will be removing the label that was added to it. I also want to caution everyone that we're on the same team and it's really important that we work together in a collegial manner. Let's not take offense where none was intended and let's not say things to minimize the other point of view.

WaxCylinderRevival commented 9 years ago

I agree that this issue is not trivial, as it raises several large areas of needed re-alignment both in input and content standards and display. In fact, it's much more than a label.

To break down the initial question more neutrally:

General Category I

Type A
- Medium 1
Type B
- Medium 2
Type C
- Medium 2
- Medium 3

The initial suggestion essentially uses Medium 3 and applies it to include everything under General Category I. As you can see, this move would be problematic, using a more discrete term as an encompassing term.

This discussion illuminates quite a few larger problems:

In our filter options, in my opinion, we aren't clearly distinguishing between the general category and specific genre/form of the original object and the file type and file format of the digital object. These attributes answer very different research and recall needs.
- Just one example: For original genre/forms of oil paintings, watercolors, artifacts, digital photographs, daguerreotypes, maps, and data graphs, the digital objects would all be image/jpg and image/tif Drilling down to jpeg but still not offering a way to drill down to specific types of materials hampers the utility of our catalog.
We need to look at both our terminology and data structure to allow for better search and filter options. I agree with Meredith's statement, "So many times before we've been compared to the Library of Congress and folks say they can find what they are looking for there, but they can't on our website. I think it would be good for us to take a look to see what the other institutions are doing online so that our terminology is fitting with what's best practice for institutions like us, on the web."
- I've considered analyzing these vocabularies and structures as part of our content standards investigation and this conversation underscores, for me, the need to do so.
- This need and related controlled vocabulary issues will grow if we expect to ingest the Presidential Libraries' artifact metadata from TMS. They are currently using the classification system from Chenhall's Nomenclature for Museum Cataloguing.
For digital object file format, we should be using MIME media types in order to better align with other digital libraries and to be of more use to developers: type + / + suffix https://www.iana.org/assignments/media-types/media-types.xhtml | http://en.wikipedia.org/wiki/Internet_media_type#List_of_common_media_types
- Types:
- application
- audio
- image
- message
- model
- multipart
- text
- video
- Examples:
- image/jpeg
- image/tiff
- audio/mp3
- audio/ogg
- text/csv
- text/rtf
- text/xml
- video/avi
- video/mp4

Using the MIME types, we should be able to easily script our filters to allow for faceted browsing/retrieval:

Refine By: Digital File Format

Audio File
- MP3
- OGG
Image File
- JPEG
- TIFF
Text File
- RTF
- XML
Video File
- AVI
- MP4

A lot of times, I want to get to a digital image, but I don't necessarily want to exclude TIFFs or JPGs from my search.

I have other thoughts kicking around, but I'll sign off for now!

DeniseHenderson commented 9 years ago

John and I discussed this briefly, particularly after last week's OPA IPT. I think we can all agree that a larger discussion is warranted. I'll work on setting up that meeting.

DominicBM commented 9 years ago

I think we should be careful not to conflate the technical and the content discussions/decisions. Do we want to be able to configure the text for refinement categories/breadcrumbs? I think the answer is yes, especially because of the clunky file format names currently displayed, right?

So, what changes should we actually make? That is worthy of the larger discussion, but that is separate —making requirements/ST completing the work on the configurability functionality do not actually depend on the outcome of that discussion, and they can happen in parallel.

WaxCylinderRevival commented 9 years ago

In this particular issue, the data content and the technical decisions are very much linked. If you write rules for today's data without looking at what will be added in the future [partner data, museum objects, or otherwise], display labels that work in Sprint 1 will break when you import new or different data, unless you code for future data or have a fix session in the future (Sprint 2 or otherwise).

My suggestion to use MIME types, for instance, is to control our vocabulary for ease of transformation and scripting. Just glance at all the permutations in your screenshot:

Images (GIF)
Images (JPEG)
application/zip
Microsoft Word Document
image/tiff

They are structured differently so you'd have to predict every possibility and write a matching display label for conversion. Alternatively, we can flip to MIME and just have the developers work by pattern: type / suffix. Their script would then work without re-coding no matter what file formats are entered in the future. My suggestion was to support your desire for better, more commonly accepted display labels in the file format area, Dominic.

For Sprint 1, I'd like to suggest a simple change of "Refine by: File Format" --> "Refine by: Digital File Format" as part of your display enhancements.
To possibly branch to another issue or sprint, I'd really like to see our "Refine By: Types of Materials" expand to checklists or another filter "Refine By: Original Format" (depending on ease/group debate), e.g.

(-) Photographs and Other Materials Aerial photographs Daguerreotypes Drawing Photographs Watercolor painting etc.

Just to explain my suggestion: The difference between a digital object file format and the original genre/form is rather important and they should be different filters/refinements. A digital image file does not necessarily mean the original archival item was an image (analog or digital). If I want artifacts, I want to filter to narrower terms/types: guns or jewels from heads of state visits. The related digital object, if any, would be an image (in various file formats JPG, GIF, TIFF, JPEG 2000, etc.) or video (MP4, etc.). It would be preferable to allow people to filter by original object genre/form as well as by digital file format (especially with higher-level groupings like image and video as often people aren't searching specifically for a JPEG vs. TIFF but for what's available as a digital image, at least to start). Original object format is definitely important to researchers, more so than digital file format for analog materials as the digital file is a representation of the original object.

Anyhow, I certainly agree with your intentions! Let me know if you need my eyes on anything related to this issue or others.

usnationalarchives / OPAProd

Change default display names of file types and types of materials #39