newtfire / textEncoding-Hub

shared repo for DIGIT 110: Text Encoding class at Penn State Erie, The Behrend College
https://newtfire.github.io/textEncoding-Hub/
Creative Commons Zero v1.0 Universal
17 stars 1 forks source link

Crowdsourcing Transcription: Examples, Issues, Reflection #47

Closed ebeshero closed 2 years ago

ebeshero commented 3 years ago

Here's an example of a screenshot: You can drag and drop OR copy and paste the screenshot image file into the Issue window as you are typing here:

Screen Shot 2021-09-22 at 10 05 50 AM
ebeshero commented 3 years ago

For those unfamiliar with taking screen captures, here is some guidance:

Tiny-Pickles commented 3 years ago

My transcript was a historical document about North Carolina African American schools. I think we could markup the progression of civil rights with each series of these documents. For example school back in the early 1900's were a lot different then the school we see today. Most school were segregated at the time maybe these documents could be used to further research terms associated with race at the time and the progress of segregation. One issues I found with the transcription is reading had written cursive. Also I think have a way to transcribe the location of the different headings and subheadings of the page would be helpful to represent the layout of the page.

Transcription-digit110

arrowarchive commented 3 years ago

When browsing through the transcriptions, the handwriting was nigh-unreadable to me. Thankfully, after enough time, I came across some journals and letters from World War II that were written on typewriters. The test was faint, but still legible.

From what I could gather here, it appears as if soldiers had a hard time returning home in the last years of the war. This soldier in particular mentions boarding a ship in San Diego and later mentions Okinawa, making me wonder if he was stationed there. He hasn't received any mail, hasn't received any new clothes, and is desperate to return home to his family. After some research, he arrived nine months before Hiroshima and Nagasaki were bombed and the U.S military are struggling to get people home possibly due to the fallout, which has lasted nearly two months. It was interesting reading about a soldier's perspective around that time and being so close to Japan.

transcript1

transcribe2
austinmurry commented 3 years ago

My document was a set of diary entries from a military soldier working in the barracks. As I was transcribing this typed diary, I was reading into what this man was writing about. This set of entries was about this soldier who was a little home sick and could not wait for his unit to be relieved come that November. He talks about what he did throughout those days and how her is using these journal entries as a type of therapy for the now and the future. He claims that he has a lot of great memories that he doesn't want to forget, and if he ever seems unhappy then all he has to do is read these journal entries and he will be jumping off the walls happy again. He also talks about his wife whom he loves and misses very much. Transcribing this was interesting and kind of fun as I got to take a look into what this soldier was thinking during the time he was writing these and I got the chance to turn these into more modern text.

Transcribe HW
Yuying-Jin commented 3 years ago

image This document I transcribing is a letter written for map transaction. I find hand-written contents are difficult to transcribe, because most people prefer to write with swashes. I often cannot distinguish between b and l. When I meet the name of person or place, I can guess what the whole word is according to several letters. The printed content is not difficult to transcribe, just need to be careful.

sveludandi commented 3 years ago

My transcription document was Iowa Seed Catalogs from 1883. I thought this was interesting to read as this page shows comments from various customers. This page has already been transcribed by what looks to be some kind of software. This software picked up letters that were cut or miswritten and used symbols instead to look like the original. For example "we" was written as w;; because the e was a little faded on the original. There were also many other mistakes on the transcription, mainly symbols that shouldnt be there and mispelling/misinterpretatio Capture2 n.

thammer12499 commented 3 years ago

The transcript document I contributed to was Celebrating 175: Louise Nevelson, Subject File, National Association of Women Artists, 1959-1965. The different transcripts I looked through were very clear and listed. Many of the pages I went through listed people I had never heard of before. I had an issue with previous contributors choosing to shorten the documented transcript rather than following its format as closely as possible before I was kicked off by another contributor. I thought it was incredibly strange how only a single person could contribute at a time. Screenshot (308)

erinmooney commented 3 years ago

image

I think it is pretty cool how much collaboration is actually done on these! There are quite a few handwritten journals (like this one) that are almost completely transcribed.

I am working on the James Carroll Beckwith, Diaries, 1873-1878. I would actually like to read some of the linguistic differences back then versus now.

I think the issues that project organizers can have when reviewing and correcting transcriptions is if they too have no idea what a {{?}} word is. What if a word or phrase is way too much like of a scribble to comprehend? Will they just keep passing it on until someone figures it out? Or will they just mark it as unknown (until later identified (if it does get identified))?

jbg5721 commented 3 years ago

As I was searching I found this document with Anglo-Saxons in the title and I thought that would be ancient and cool to try. I didn’t think about how that also means it wouldn’t be in any recognizable English. I found it interesting that I could identify letters. I eventually came across a Federal Census document from 1850. Pretty straight forward with a long list of various names. One issue I feel like I see with the transcription process is it takes a lot of time. A lot of people don’t really want to sit down for many hours and put in the needed effort for transcription. This is especially true for documents that are in a language that perhaps aren’t used anymore or it is in handwriting that is hard to read. 1C095296-8CB1-45D8-BF79-89F1778243D0 529AF7EF-A811-47AA-AC9E-5527EE8FDC7A

SCD5363 commented 3 years ago

Capture3

ZakMurphy191 commented 3 years ago
Screen Shot 2021-09-24 at 5 24 28 AM

I thought that this hw assignment was super interesting, and it was super cool to come across all the different papers from Alabama Department of Corrections records; Alabama Board of Inspectors of Convicts investigation and testimony, 1886. This was interesting because not only did I get to read the paper of what people have done wrong in the past, but I thought it was super interesting comparing our age to back then.This really gave me an eye opener for how much times have changed. I see a possible project showing the comparison from todays age to back in the late 1800's and see first hand how different things were. It would be really cool to create a super cool website showing the differences in the centuries, to show others the differences.

The issues I had with this project was that the transcription was super hard to read, this little paragraph took me well over 2 hours to transcribe. Other than problems that I can fix like studying cursive I thought that this was super cool. I especially like how others proof read your work and you can comment back and fourth with them about the transcription to ensure the page is correct.

Screen Shot 2021-09-24 at 5 25 11 AM
Janman813 commented 3 years ago

transcription This project was quite interesting. As I was looking at the document it was a little hard to tell, but it became numbers and months which made it easy. the document that I worked on I think was someone's work long of either hour they work or money they made. but then again I do not know. This document I would have to look into more and find others that are similar and compare them to each other to find what it is truly portrayed.

acc5763 commented 3 years ago
Screen Shot 2021-09-24 at 10 18 02 AM

My passage is from the book Poems of Cabin and Field written by Paul Laurence Dunbar. It was really interesting seeing all the mis-spellings, being that it was written by someone who was not fortunate enough to go to School. This document was very easy to transcribe. However, a lot of the hand written ones are really hard. I start to transcribe a few of them but had a lot of trouble doing so. Thankfully, I found a much easier one.

aidanvray commented 3 years ago

transcription

I started transcribing a page from a draft of "A Trip Around the World". The draft is from 1910-11 and seems to be formatted as dated journal entries. The page I worked on was about the author meeting officials of the Japanese government. For further research I would like to learn more about the book itself, since from the single page I worked on I'm still not even sure who the author is. I would also look into what else the book covered and try to find out if it ever actually got published. My only issue with the process was that even after looking through the expanded list of guidelines I wasn't quite sure how to transcribe text that was written over other text.

ericsandbloom commented 3 years ago

names names thats the game

So the list itself was a bit hard to work with because aside from the column style, there wasn't a good numbering system. So it was often hard to keep my place. However, this could potentially be an example of the problems with people not learning cursive anymore. I could recognize names that were unique but because they were in cursive I could still read them. The list in the typing area was a bit messy. This kind of list would probably be best to go through with a spreadsheet, it's what it reminds me of the most.

bealse18 commented 3 years ago
Screen Shot 2021-09-24 at 11 55 14 AM

I was struggling to read the files with handwriting mostly because of the lack of context in some of them, I found the typewrite r files to be easier to transcribe but hopefully with some practice I can work my way up to the handwritten ones

NickyV1234 commented 3 years ago

transcribing I haven't worked with cursive ever since i was 11 years old in 5th grade. I clearly need some practice but for the most part this was rather interesting to transcribe, some words I could look at and tell right off the bat what they were. also it helps to understand the context of the sentence you are typing because that can also help you predict words that will be said which can help with interpreting the writing.

Tiny-Pickles commented 3 years ago

I transcribed another collection from the North Carolina archives, the Maud Hayes Sticks letters/transcripts. As I was transcribing I noticed that the rules did not cover certain aspects of writing. For example they did not have a rule for adding hand written words or letters for transcription. They also did not have a rule for letters or words hand written on top of the type text as well. I believe that the main focus for transcription is to get the main text content across especially when it involves letters or drafts of novels. But I also believe that structure is important when it come down to focusing on poems or government documents. My final thought is I think showing discrepancies between revisions is important for the final transcription of the document and the rules for transcription should reflect that.

2ndTranscription

transcription-rules

arrowarchive commented 3 years ago

I transcribed another entry from the Carolina Archives. This time, I challenged myself to find some handwriting that was somewhat legible and try to transcribe it. I believe this is part of a letter collection from the Civil War, and most of it was easy for me to understand. There were a few areas where I did not know what the words were, but it was easier to read than most of the letters I came across.

That said, I feel like a lot of older documents (especially letters) are written in cursive. It's bizarre to think about, since cursive has been all but dropped (with the exception of signatures) since I was in third grade. Was cursive widely encouraged over what we consider "print?" when did using "print" become more common? it raises some questions that makes me wonder why older documents (unless written on a typewriter) are harder to transcribe.

transcribe 2 1 transcribe 2 2

erinmooney commented 3 years ago

image

I picked this one because I am minoring in French and studied Italian in Behrend's Italian Culture class, so I felt like I could make fairly educated guesses on what the words on the clipping were.

To me, the capitalization of some of the words was odd, but they seemed to be that way in the written text. While working on it, I was wondering if this kind of capitalization was common in Europe during this time. I thought it would be extremely important to follow exactly how they spelled different words because that might be an older way of spelling a word that I know (or think I know) to be different now. Especially because this is handwritten, I tried my best to copy the spelling, but as @ericsandbloom and @bealse18 mentioned, it is quite difficult to read these sometimes. So, again, I tried my best to reiterate the words and their exact written spelling.

I like that about this set up. Because it is being reviewed and passed around, you just need to give your best, educated guess at what is being said and then someone else with better resources on the topic will review it. This set up takes a lot of pressure off of the transcriber to be 100% correct! This kind of coincides with @Tiny-Pickles comment where they said, "I believe that the main focus for transcription is to get the main text content across.."

austinmurry commented 3 years ago

For this one, I tried to work with handwritten text. But seeing as I couldn't read most of it and haven't really had to since the 3rd grade, I chose to move on and continue with another typed document.

As I was going through this text I was trying to follow the rules played out by the creators of the page as best as I could, but sometimes it was harder than I had originally thought. They don't have a way to point out special character or different stylizations of words or sentences. The document I transcribed had a line dash through the middle to break the page, but due to the linear format of transcribing that the creator wants, and as @Tiny-Pickles pointed out, they just want the main text and contents of the document to be transcribed. My xml worked brain doesn't like this though and wishes there was a way to point out these different characters and stylization choices throughout our transcriptions. This could be a problematic thing though as those stylization characteristics are there for some sort of a purpose. So if you take those out of the document when we transcribe it, then overall you are in a way changing the document, which doesn't seem right...

Screen Shot 2021-09-26 at 6 19 41 PM
Yuying-Jin commented 3 years ago

Originally, I tried to transcribe a hand-written document, and then I gave up. The person who writes the document never considers if the document can be understood. Finally, I still chose a printed letter. This letter is about arranging a student's residence. The receiver was a student, and the student was noticed by home camp that after camp closing the student needed to go to Andover as planned, and if Mrs. Erving was not willing to take the student during this holiday, the student had to stay at the inn and bear huge expenses.

image

luh429 commented 3 years ago

FullyTranscribed After a lot of searching I finally found a hand written document that I felt like I could transcribe. I got the document from the North Carolina site and even though it was short it was still difficult to describe because I struggled to read the handwriting and took me longer than expected. The document was a report from the city attorney.

jbg5721 commented 3 years ago

I wanted to go for a bigger challenge this time around and, despite the handwriting not being too awful, and still being legible. I still cannot read script well. As you’ll see in my screenshot I had a cursive alphabet open right to the side. It was a hunt just to find the next letter for some of these words. The text is from a log book from the CSS Alabama. So of course there are a lot of sailing terms that I am not familiar with. After a certain point I stopped working on this, mostly because I just felt like I wasn’t even being helpful. Too many words were just becoming [???]. I added the first picture because I wanted to see what month any of you thought this was. To me it looks like Joniy, but we all know thats not a month. Given that this is a logbook, I would think the most important information is up near the top. So the bad look out at the mast. Looking at my screenshot while posting is making me realize that the writer wrote mast not mass. But I do know that the mast is tall center pole that holds all the sails. I could think that this could mean some sailer was not paying attention to the ropes and sails or if they were up high, they might not have been paying attention potential dangers.
36A4D542-7271-4205-A2A6-31D372DC7332 99283BD5-9BAB-4710-B644-1852AA423DA9

Janman813 commented 3 years ago

form Transcriptions part 2 When going through this again and finishing the first page. I realized that it's not a work log but a tenet log for those who live in that apartment. I could not tell what the one word was at the top and the middle statement till going through again.

acc5763 commented 3 years ago
Screen Shot 2021-09-27 at 10 14 13 AM

I transcribed CELEBRATING 175: JAIME DAVIDOVICH, BROCHURES AND BOOKLETS. This document pretty much promotes tourist attractions and means of transportation. The part I transcribed was in French and English. I initially tried to transcribe cursive handwriting. However, I was struggling and hadn't made much progress. It is difficult to read old cursive handwriting so I switch to typed out writing. With more time and patience ill be able to focus on handwritten documents.

hjl5363 commented 3 years ago

For this assignment, I decided to analyze the Ordinance of Convention that is from the articles of the United States Constitution. While the handwriting is okay, I still had some trouble with analyzing the document (I personally do not have very good handwriting or cursive reading skills.

Some questions that I ask myself when transcribing this document are how the document, in particular, should be formatted, particularly with pages, and the proper capitalization of letters. In addition, there are some special characters, such as "^" that I do not exactly understand the most appropriate course of action. Revisiting the handwriting, I do not believe that I could ever encode the handwriting on the bottom of the second page, particularly with the text that crumbled and on top of each other.

The other thing that I could see being an issue is that the way we speak, use grammar, and communicate evolves over the years, and it will continue to do so. Nobody was alive when this was written, so I think that there might be some difficulties that come up from there.

Screen Shot 2021-09-27 at 10 03 08 AM
aidanvray commented 3 years ago

transcriptionComplete

Here is my finished transcription from the "A Trip Around the World" draft. While working on it I encountered a couple more situations involving handwriting going over the typewriting, and the guidelines don't give a very definitive answer for how capture that. I also didn't like that the document had underlined text and mix of different types of writing, but the guidelines don't distinguish this.

hjl5363 commented 3 years ago

For this Crowdsourcing Transcription assignment, I decided to try to transcribe a Colonial Court record. First, I ran into a ton of issues with the handwriting of the document, and I found it very difficult to read what was written. As I mentioned in my previous post, I think this is to partially blame for my poor cursive reading skills. I have historically not had very much experience trying to read this type of writing. Another major issue that I can identify with this particular transcription is trying to understand what was written on burned paper and fully grasp what was written at the end of this document. I have personally never seen a document that has been formatted in this particular way, especially with something being written to the right of the signature. I also want to share that these two exercises have been very interesting for me to complete. They have really taught me the obstacles of encoding text, and some of the big ideas that come with it.

Screen Shot 2021-09-27 at 10 37 33 AM Screen Shot 2021-09-27 at 10 40 05 AM
luh429 commented 3 years ago

Document3WithCircles After a lot of searching I finally found a handwritten document that I felt I could possible transcribe. At the point of this picture I was not done and had skipped over a lot of words but one thing I found very interesting was how the writer makes his capital letters. A lot of his letters (for example the letter C) have a big round loop after them that really threw me off. Luckily I looked at the title of the document and picked up the pattern in his handwriting form the "Cs" in "Common City Council" and the when I came across other letters with the loop I realized it was just his style and not a separate letter. From context I also realized he writes his lower case "e" a few different ways which I found very strange. I went through and circled some of the capital letters from my screenshot for an example.

ZakMurphy191 commented 3 years ago
Screen Shot 2021-09-27 at 10 53 35 AM

For this time I decided to work on a transcription for African American Education. when I am transcribing a page I read the page outlaid to myself to make sure what I think makes sense, with that information I read I run down the list of what? why? when? I do this because the information that is being interpreted needs to have the correct dates, slang, and what they were talking about. When transferring the important information to the text box it is really important to the data collectors that you write it exactly the way the person wrote it back in the day. Another big rule is to make sure if anything is crossed out to say that in the page where they crossed it out, along with all the other types like that. In all the most important role for my transcription was duplicating the paper to look exactly what the original form was. I had a lot of fun doing this I can't wait to hear what you all have to say!

ericsandbloom commented 3 years ago

In the process of continuing on the same page I was transcribing before, It became locked so that was a bit frustrating. But looking at my peers work, it seems most people are having legibility problems. Finding a document they can fully read can definitely be challenging. Even as someone who learned cursive, I found a lot of the documents hard to read. Since most of everyone else's complaints are generally the same, I'd say that legibility is the most important when trying to transcribe these documents. With factors like time and and aging these already hard to read documents are becoming nothing more than chicken scratch. The most important info, especially for the lists would definitely be the names of the people. If we don't know whose info we're recording, what's it more than just a bunch of useless numbers and words?

ZakMurphy191 commented 3 years ago
Screen Shot 2021-09-24 at 11 55 14 AM

I was struggling to read the files with handwriting mostly because of the lack of context in some of them, I found the typewrite r files to be easier to transcribe but hopefully with some practice I can work my way up to the handwritten ones

I totally agree with you it was very difficult to read some of the handwriting that the author of the text write. I think this has to do with how formal they write depending on the time frame, I also believe that it is super hard to read cursive.

ZakMurphy191 commented 3 years ago

I wanted to go for a bigger challenge this time around and, despite the handwriting not being too awful, and still being legible. I still cannot read script well. As you’ll see in my screenshot I had a cursive alphabet open right to the side. It was a hunt just to find the next letter for some of these words. The text is from a log book from the CSS Alabama. So of course there are a lot of sailing terms that I am not familiar with. After a certain point I stopped working on this, mostly because I just felt like I wasn’t even being helpful. Too many words were just becoming [???]. I added the first picture because I wanted to see what month any of you thought this was. To me it looks like Joniy, but we all know thats not a month. Given that this is a logbook, I would think the most important information is up near the top. So the bad look out at the mast. Looking at my screenshot while posting is making me realize that the writer wrote mast not mass. But I do know that the mast is tall center pole that holds all the sails. I could think that this could mean some sailer was not paying attention to the ropes and sails or if they were up high, they might not have been paying attention potential dangers. 36A4D542-7271-4205-A2A6-31D372DC7332 99283BD5-9BAB-4710-B644-1852AA423DA9

I think that this is really interesting that you put cursive hand writing on there, I did the same thing when I was reading I had cursive next to me so I could digest the different ways and styles that were being written. I think the more you transcribe the easier it gets so keep trying its hard but it will get easier to read. I also found out that reading the pages before can help you understand the following pages.

kmh6907 commented 3 years ago
transcription

I thought this homework assignment was really interesting but at the same time was a lot more complicated than I expected it to be. I wanted to challenge myself a bit so I chose a handwritten document. I was able to figure out a majority of the words but definitely not all of them. I found some of the words to be very unreadable because a few of them look like a bunch of jumbled lines. There are also some letters that resemble other letters and I wasn't able to figure out a word that would make sense for it. Some issues people might have with the reviewing process is the time it takes to go through and make sure all of the words, symbols, and transcriptions are correct. I feel like this is a very time-consuming process that not everyone has the patience or time for.

NickyV1234 commented 3 years ago

transcribing22 transcribing2

all that was really left was analyzing the audit logs. the transcribing of these was very weird because the format barely made any sense on the transcribing side. some transcribing didnt account for the fact that some pages were side by side and typed out one page and then transcribed the other page and put that below the other. however reading numbers was extremely easy. over all Id say this was just about as difficult the first time around

SCD5363 commented 3 years ago

Capture

This seamed to be a leger with names and dates. It was very difficult to transcribe and understand what it said. I think that a couple of words were misspelled in the document