Open trosesandler opened 8 years ago
I've already done some articles for this journal, I think I made a start by harvesting the BHL part metadata and using that. If the data lacks pagination that will make things a bit more tedious, but it is still doable. The journal history looks complicated, there's a journal history here:
Depew, L., & Berghe, E. V. (1994, July). Journal History. Journal of East African Natural History. East African Natural History Society. http://doi.org/10.2982/0012-8317(1994)83[97:jh]2.0.co;2
@trosesandler I'be had another look at the spreadsheets and they look good (the default iPad viewer mangled things so that I couldn't see the page numbers). I could certainly use these to add extra articles to BioStor.
@trosesandler BHL already has a lot of the metadata associated with the PDFs it has for this journal, so I can probably also just use that.
Hi Rod
Yes the journal made several name changes
BHL has only digitized the first one from 1910-1942. You said you already did some articles for this journal but I don't see those showing up in BHL so could you send me a link to what is complete?
The metadata I sent you was what the publisher sent to me several years ago when we had uploaded the PDFs via Citebank. Since much of the volume and issue data was missing in the columns I added that but it can also be parsed from the filename. Sounds like you are able to grab the metadata from the PDFs so you don't need me to send you spreadsheets right? Just wanted to clarify.
Trish
On Mon, Sep 26, 2016 at 4:22 AM, Roderic Page notifications@github.com wrote:
@trosesandler https://github.com/trosesandler BHL already has a lot of the metadata associated with the PDFs it has for this journal, so I can probably also just use that.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rdmpage/biostor/issues/24#issuecomment-249521000, or mute the thread https://github.com/notifications/unsubscribe-auth/AG6pBGTEFFxV4a-CHi5iHQDMcUZi-qEgks5qt47ngaJpZM4KFRJl .
@trosesandler Actually BHL has pretty much everything up until BioOne started publishing the journal, including Utafiti:
I'll take whatever metadata is available, so maybe if you send me whatever spreadsheets you have, and I'll also work with the PDF-related metadata.
Actually you're right I didnt' realize BHL had digitized almost all of it. In that case I'm attaching the full spreadsheet that was given to me several yrs ago. At that time I did some normalization on the filename and other tweaks. More recently I filled in the volume and issue columns since they were pretty empty. I added them by looking at the filenames and the actual content online. If you are able to parse the volume and issue values from the filename that might be faster JEANH_import_test.xlsx . Otherwise I can add them manually if that saves you some time.
Hi Rod, Just checking in to see how the spreadsheets were working for you. Were you able to make use of them for article-izing the content?
@trosesandler Making a slow start. I've imported the Excel spreadsheet into Google Docs and have extracted start and end pages from the column Pagination_in_host. There's going to be some manual work involved to find the articles :(
yep I figured because of the way things were bound. If I can help with any of the manual part let me know.
Trish
On Fri, Oct 7, 2016 at 11:30 AM, Roderic Page notifications@github.com wrote:
@trosesandler https://github.com/trosesandler Making a slow start. I've imported the Excel spreadsheet into Google Docs and have extracted start and end pages from the column _Pagination_inhost. There's going to be some manual work involved to find the articles :(
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rdmpage/biostor/issues/24#issuecomment-252298907, or mute the thread https://github.com/notifications/unsubscribe-auth/AG6pBCR2ae2X9eVUPyOTUF7CfXG2UWTuks5qxnPCgaJpZM4KFRJl .
@trosesandler I've sent you an em,ail with an editable link to the spreadsheet I'm using to do the mapping https://docs.google.com/spreadsheets/d/1czlyL-WAApnxBxZLX2kIq1OmqIQ5ICVKY5C-8wLjjf4/edit?usp=sharing
Progress (both automated and manual) is here: http://biostor.org/issn/0012-8317/year/1970
Rod
Thanks for sharing this with me. This does look like alot of manual work! In order for me to help with some of it I need to understand a few things. 1) what is the relationship between the spreadsheet and the progress page? In some cases that are in sync but in others they are not. e.g. the article "A New four-toed mongoose from Kenya, Bdeogale Crassicauda Nigrwscens, ssp. nov." shows as being completed on the progress page but in the spreadsheet the BHL page id is blank 2) how much of the articles for which we have BHL page ids were done manually and which were done automatically? I'm trying to understand why it succeeds sometimes and fails others 3) at what point would it be useful for me to manually find the page ids? I wasn't sure which of the articles you've tried to automatically match so far and which failed.
thanks!
On Fri, Oct 7, 2016 at 4:11 PM, Roderic Page notifications@github.com wrote:
@trosesandler https://github.com/trosesandler I've sent you an em,ail with an editable link to the spreadsheet I'm using to do the mapping https://docs.google.com/spreadsheets/d/1czlyL- WAApnxBxZLX2kIq1OmqIQ5ICVKY5C-8wLjjf4/edit?usp=sharing
Progress (both automated and manual) is here: http://biostor.org/issn/0012-8317/year/1970
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rdmpage/biostor/issues/24#issuecomment-252362137, or mute the thread https://github.com/notifications/unsubscribe-auth/AG6pBJhJ7L5fILxA5pdeZ2eiIsNTzsd1ks5qxrV8gaJpZM4KFRJl .
Hi Trish,
Sorry for the lack of documentation :O
Rod Ok then I will wait until you've added the PageIDs to the spreadsheet and then I'm happy to help with the manual work - just let me know. For the EABL project this is my primary role - to figure out how we can increase our article-ization of BHL content so whether I share citations with you and you are able to automate it or whether I do it manually- both get us towards that goal.
Hi Rod
Just checking to see how the article-ization is coming for this journal and where I can be of assistance.
@trosesandler I'm swamped at the moment so haven't made any more progress on this.
Rod,
BHL has recently digitized some items for the Journal of the East Africa and Uganda Natural History Society. see http://www.biodiversitylibrary.org/bibliography/14163#/summary and we'd like to article-ize this content. The publisher, East Africa Natural History Society, gave us article citations for this content back in the Citebank days. I'm hoping we can use pass the citations on to you to index via BioStor. One of the challenges with the journal is the inconsistent numbering of volumes and issues. Also many of the volumes were bound together which means there will be multiple page 1s in a single item. I recall you saying this makes it more challenging for BioStor. I have attached some sample data for your review just to see if contains enough data to do the matching. Let me know your thoughts. JEANH_1910_1918.xlsx JEANH_1910_1918.xlsx
Trish