grwells / TickBase

Data storage for web crawler results from TickBase project, summer 2021.
0 stars 1 forks source link

Exported Author Strings Contain invalid Chars... #2

Closed grwells closed 2 years ago

grwells commented 2 years ago

Results from Data Dryad and potentially other sources contain '[' and other strange formatting artifacts from when they were in JSON. All source output formatting needs to be checked and amended so that uploaded content to DSpace is uniform.

grwells commented 2 years ago

Mendeley output looks relatively good. Needs commas removed from names that contain initials.

ex. N. Boulanger, P., Boyer, E., Talagrand-Reboul, Y., Hansmann should be N. Boulanger, P. Boyer, E. Talagrand-Reboul, Y. Hansmann

FIXED

grwells commented 2 years ago

Mendeley Data output is good, some entries have multiple alternate spellings from the metadata.

ex. Gerardo Fracasso Gerardo Fracass, Erik Matthysen Erik Matthyse, André A. Dhondt André A. Dhond, Dieter Heylen Dieter Heyle should be Gerardo Fracasso, Erik Matthysen, André Dhondt, Dieter Heylen

FIXED

grwells commented 2 years ago

Figshare output is good despite not having any authors because there is no author field in retrieved metadata. Also currently only retrieving datasets...

grwells commented 2 years ago

Data Dryad has some weird output characters but no apparent formatting issues in the CSV.

grwells commented 2 years ago

KNB looks good.

grwells commented 2 years ago

Springer Nature had some funky stuff with square brackets that were an artifact of previous debugging. Authors are now reformatted and output correctly.

FIXED

grwells commented 2 years ago

Neon should be good, doesn't have authors.

grwells commented 2 years ago

PubMed has brackets around list, ' around elements, and some illegible characters.

ex. ['Magalhães-Matos PC', 'Araújo IM', 'Valim JRA', 'Ogrzewalska M', 'Guterres A', 'Cordeiro MD', 'Cepeda MB', 'Fonseca AHD']

FIXED

grwells commented 2 years ago

LTER is good, outputs author in last, first order. Also has some unprintable characters, not sure if this is an excel/windows problem because things look good in linux.

grwells commented 2 years ago

Close for now, reopen after talking to Luke.