Open larnsce opened 1 year ago
Explore how to get raw data hidden in the data source of each table in the pdf:
R packages needed: pdftools
@larnsce For this data, each table has the following sections
I have already extracted general information, publications, and data source links. Shall we set up a package for this?
Thanks, @mianzg. Yes, please set up a package. For the data sources, please establish a Zotero library in our GHE group library. We can then use the .bib file to reference each data source.
@larnsce For this data, each table has the following sections
- General information
- Feedstock
- Experiment procedure
- Publications
- Data source links
- Additional notes
- Description of data (including some figures)
I have already extracted general information, publications, and data source links. Shall we set up a package for this?
Great! How about naming it asdryingfaecal
? @larnsce
I forgot to mention that I want to exclude additional notes
and description of data
for the meta-data table. Sections feedstock
and experiment procedure
need more investigation to be extracted.
I forgot to mention that I want to exclude additional notes and description of data for the meta-data table.
Agreed. Not necessary.
Great! How about naming it as dryingfaecal?
I like it.
The metadata table and the package setup is ready: https://openwashdata.github.io/dryingfaecal/
One-line summary
The FS Methods book was published with a PDF on Gates Open Research that contains metadata and dropbox links to approximately 100 MS Excel sheets, which could provide a great resource if they were published in a machine-readable and structured format.
Background information
The present document consists of an addendum of data to the Handbook of Methods for Faecal Sludge Analysis. It is part of a project funded by the Bill & Melinda Gates (BMGF) through the OPP1164143, untitled “Characterization of faecal material during drying”. Data was shared by partners of the PRG over 5 years. It's commendable that this document exists, but it's no use to anyone in this format.
https://gatesopenresearch.org/documents/4-188
Concrete proposal
Create a datapackage that contains all the information from the different MS Excel files, together with metadata. Make it searchable, machine-readible, and structured. Steps:
Pros and cons
Pros
Cons
Alternatives