sul-dlss-labs / spoc

Species Occurrences (SpOc), documentation available at https://sul-dlss-labs.github.io/spoc/
2 stars 0 forks source link

Paper Metadata Dataframe #47

Open jermnelson opened 3 years ago

jermnelson commented 3 years ago

Moving into it's own ticket from #38, we'll want to use normalize the metadata for the various institutions that are currently in the Google Docs spreadsheet.

From the converted TEI XML in SUL AI 2020-2021/Project - Species Occurrences/papers_tei and the Google metadata Spreadsheet, create a papers Dataframe with the following columns:

  1. XML filename
  2. Title
  3. Institution
  4. Author
  5. Secondary Author
  6. Abstract
  7. Year
amandawhitmire commented 3 years ago

@jermnelson - I can normalize metadata in the spreadsheet if it helps with your process to generate the dataframe. I was thinking of creating a standalone spreadsheet that pulls together all of the institutional records into one sheet. Is that useful?

jermnelson commented 3 years ago

@jermnelson - I can normalize metadata in the spreadsheet if it helps with your process to generate the dataframe. I was thinking of creating a standalone spreadsheet that pulls together all of the institutional records into one sheet. Is that useful?

Yes, @amandawhitmire creating a standalone spreadsheet would be very helpful. There is a very good possibility that the one sheet could be read into a dataframe that we could directly in the app.