Open DougBurke opened 13 years ago
These appear to be all due to issues in semflow/rdf2solr5.py rather than in appsem.I have made changes that should fix both issues - e.g. this change which builds on the preceeding commits.
Will close when the changes get merged into main.
My testing on linux looks good.
Still need to be vigilant. With the new "display the title of a saved paper" functionality we see the same problem:
search for 2001A&A...380..251G - e.g. http://localhost:3000/semantic2/alpha/explorer/publications#fq=bibcode%3A2010A%26A...514A..64P&q=_%3A_
title="Physical and morphological properties of z ~ 3 Lyman break galaxies: dependence on Lyα line emission"
if this is saved and then 'saved search' selected you can see the \alpha is not displayed correctly.
The 'saved paper title' issue should be fixed by
https://github.com/DougBurke/appsem/commit/ad98e0f609dac8070a57020f32f0fcd8765e466e
I note there are several places where the charset is not explicitly set to UTF-8 in server.js so leaving this bug open as a reminder.
Edit: tested this fix on OS-X and Linux and looks good.
Edit: sample bibcodes with UTF-8 characters include
2006Natur.441..724R (beta)
2005A&A...433.1031D (beta)
2004ApJ...606.1174B (Angstrom)
2005ApJ...622..680R (Angstrom)
2005ApJ...633L..37I (eta)
2006ApJ...642.1098H (eta)
2004ApJ...606...85C (alpha)
2009Ap&SS.320..145B (degree symbol)
Here's another problematic paper: bibcode 2005PASP..117...13S which we have a title of
A Revised Geometry for the Magnetic Wind of &thetas;1 Orionis C
rather than
A Revised Geometry for the Magnetic Wind of θ1 Orionis C
from http://labs.adsabs.harvard.edu/ui/abs/2005PASP..117...13S
2007ApJS..168...58 has a title of
Abundances and Behavior of 12CO, 13CO, and C2 in Translucent Sight Lines
where the 12, 13 and 2 are super scripts. When saved the title is rendered as (on the search page)
Abundances and Behavior of <SUP>12</SUP>CO, <SUP>13</SUP>CO, and C<SUB>2</SUB> in Translucent Sight Lines
2006A&A...458..541B is another example since its title is
Establishing HZ43 A, Sirius B, and RX J185635-3754 as soft X-ray standards: a cross-calibration between the Chandra LETG+HRC-S, the EUVE spectrometer, and the ROSAT PSPC
vs
Establishing <ASTROBJ>HZ43 A</ASTROBJ>, <ASTROBJ>Sirius B</ASTROBJ>, and <ASTROBJ>RX J185635-3754</ASTROBJ> as soft X-ray standards: a cross-calibration between the Chandra LETG+HRC-S, the EUVE spectrometer, and the ROSAT PSPC
Example bibcodes showing this include:
2005PASP..117...13S (sup)
2007ApJS..168...58S (sup)
2006A&A...458..541B (astroobj)
2005PASP..117...13S (θ)
This has been fixed in https://github.com/rahuldave/appsem/commit/4336a4a37de4b878e566fef826094fda8fba0297 (the display, that is, not the data ingest issue that leads to the ASTROOBJ tags)
EDIT converting to a general-purpose issue for oddities in the output
These are likely two different issues but mentioned here as they look the same to users. It's also a known issue but I wanted it recorded.
a) some facet values contain percent-encoded values - e.g.
nuclear reactions%3Bnucleosynthesis%3Babundances
MAST/EUVE/DS%2FS
MAST/HUT/ASTRO-2%20HUT
%20 is space, %3B is
;
and %2F is/
.b) unicode
Things like acute a being displayed as a and the appearance of the 'undisplayable' unicode symbol for the fancy double quote character or Angstrom or related.