pkp / citationStyleLanguage

An OJS 3 plugin to generate an article citation in any CSL citation style using citeproc-php.
GNU General Public License v2.0
15 stars 52 forks source link

Extra characters in the abstract when exported in bibtex and RIS formats #95

Closed jmnobrega closed 3 months ago

jmnobrega commented 2 years ago

When exporting to bibtex or RIS formats the abstract contents include <p> and </p> (in the form of &lt;p&gt; and &lt;/p&gt;), at the beginning and end. Those characters are not part of the abstract contents added to the publication.

bozana commented 3 months ago

I investigated a little bit and here is what I have found out:

It seems the library we use assumes that the citations are displayed in an HTML page. It returns the citation styles coated withing two

elements. And the library escapes HTML characters, for example in title and abstract.

Also, our abstract contains HTML elements.

However the BibTex and RIS format are not displayed on the page as the other formats, but provided by us for download. We do remove the

elements that we get returned from the library, but our abstract have HTML elements too.

That means that, for example, a title would then look like this there: Title &amp; Test or abstract would look like this: &amp;lt;p&amp;gt;The antimicrobial, heavy metal resistance patterns and ... (&amp;amp;gt;56.4 kb) encoding .... &amp;lt;/p&amp;gt;.

Thus, I can see the following possibilities how to deal with it:

  1. Do not provide BibTex and RIS for download, but display them as other styles on the page, so that users can copy & paste them, if needed.
  2. Do nothing i.e. leave it as it is. In that case the users will have those HTML tags and escaping when they import the BibTex or RIS format into their citation software.
  3. I actually do not think this is a solution, but to mention it here however: to somehow remove HTML elements from our abstract (and title). The HTML escaping will however remain i.e. for example the sign '&' if used in text would be escaped (or we would need to revert it somehow too).

I think I prefer the solution #1.

bozana commented 3 months ago

Hmm... Now I see that we save some parts of the title and abstract html encoded, for example: <p>Abstract: One more test' &lt; <strong>EN &amp; sign </strong> bla bla... </p> So the only solution is to somehow disable the encoding by the library we use... :-\

bozana commented 3 months ago

Hmmm... I think we can use htmlspecialchars_decode for download citations BibTeX and RIS

jonasraoni commented 3 months ago