zooniverse / AnnoTate

Full text transcription app for the Tate Britain
http://anno.tate.org.uk
Other
12 stars 2 forks source link

Knowledge base #148

Closed rogerhutchings closed 9 years ago

rogerhutchings commented 9 years ago

More info on how to classify various things like non-English text, images, multiple lines, etc. Should include help on how to do it, and screenshot examples - ideally from one or two known subjects which we can cut up (because copyright)

Pretty trivial in terms of code, but needs content. So far, I think we could do:

@VVH any other ideas?

VVH commented 9 years ago

Happy to do this. Can adapt tutorial text and take stills of the gifs or just screenshots.

VVH commented 9 years ago

Line by line transcription: In order for us to get good quality data (transcriptions) out of our collective efforts, we need to all transcribe in the same way. One way to increase data quality and consistency is to break the task into smaller pieces. Instead of transcribing whole pages at a time, we ask that you only transcribe one line at a time, and to only transcribe what you can read with confidence.

'What is a line?' you ask. It's not a sentence or proper grammatical unit, but a line of text running left to right. If it's in the margin or a cramped space, a line might consist of one or two words! In order for the Zooniverse approach to transcription to work--that is, for multiple volunteers to independently transcribe the manuscripts and have their individual lines of transcription compared using an algorithm, thus hopefully reducing the need for editorial intervention later--we need everyone to tackle small units of text and using the interface as described below. Thanks for your time!

Transcribing lines: Click on the start of a line, and then the end of the line. Dots will appear, followed by a transcription pane. Transcribe one line into each box, save your work, and find the next line you want to do.

Images: Select the 'Annotate Image' button, located above the page you are working on. Draw boxes over images by clicking at one corner of the image and dragging your cursor. This helps us to identify where they are on the page, but we are not collecting any further information about them at this time.

Blank pages: If the page is blank select 'Next Document' and ‘blank page’.

Multiple lines: [not sure what you mean by this @rogerhutchings

Keyboard shortcuts: @rogerhutchings

The following tags should be used to surround insertions, deletions, illegible text and text in languages other than English. If the deletion, illegible text, insertion or non-English text runs over multiple lines, use the tag for each line in which the features occur.

Insertion: To capture text inserted into a sentence over a line or off to the side, click the 'Insertion' button in the transcription pane, and type the inserted words between the tags that pop up. [insertion]Type the inserted words as they appear[/insertion]. Do not alter the tags in any way.

Deletion: To capture deleted text click the 'Deletion' button in the transcription pane, and type the deleted words between the tags that pop up. [deletion]Type the inserted words as they appear[/deletion]. Do not alter the tags in any way.

Illegible: Some text is genuinely illegible as opposed to just hard to read. This can be because the text has been crossed out or the surface on which it was written has been damaged. In these cases, just click 'Illegible'. If a whole line is illegible, place points at the start and end to transcribe, and click the Illegible button before moving on to transcribe something else.

Not English: The same principles apply! To transcribe text in a language other than English, click 'Not English' and transcribe the text, line by line, between the tags.