rossmounce / NHM-specimens

Aggregating & linking-up mentions of Natural History Museum, London specimens in the literature to specimen identifiers
Creative Commons Zero v1.0 Universal
3 stars 2 forks source link

TandF almost done #3

Open AimeRankin opened 8 years ago

AimeRankin commented 8 years ago

@rossmounce I've now been through every row of TandF (yay!) but there are just a few things that need tidied up.

1 - I need to go through the spreadsheet thoroughly to remove codes that appear more than once in a paper. Some papers though are over 100 rows long - is there an easier way to remove duplicates than doing it by eye?

2 - There's something funny about the DOIs for rows 4756-4804.

3 - I think there is a whole block that has experienced a frame shift at the very end of the spreadsheet - can this be remedied?

rossmounce commented 8 years ago

@AimeRankin great news. Sorry I didn't answer this earlier.

1.) For the long papers with many rows, you can select all the rows relevant to that paper (make sure you select rows, not just individual columns) & sort them (using the spreadsheet sort function). This helps ID duplicates. Preserving order within the same paper is unimportant after the basic filtering has been done.

2.) & 3.) I'll go take a look now...

rossmounce commented 8 years ago

@AimeRankin re: 2.) what do you mean by 'funny' ? I tried some of them and they seem to work. If one or two don't work it's probably the base section that is wrong. JVP changed publisher so some DOIs are 10.1080 whilst others are 10.1671 (for the very same journal!).

rossmounce commented 8 years ago

@AimeRankin I've just pushed a fix for 3.) The remaining 300 are all from an entom journal I think. Can just put 'examined' or 'deposited' in column A if there's clearly no individual specimen number given.

rossmounce commented 8 years ago

@AimeRankin btw, in case you wanted to learn more git / github. I recommend this Software Carpentry tutorial if you want to know more about it: http://swcarpentry.github.io/git-novice/