jshoyer / pubmode

PubMode: A fork of Stefan Washietl's PubMed interface for Emacs
GNU General Public License v3.0
0 stars 0 forks source link

Fix problem caused by new (italic) tags in paper titles #3

Closed jshoyer closed 5 years ago

jshoyer commented 6 years ago

Titles have been getting cut off at the start of species names in the PubMed Results buffer. Also a search just failed, also at a species name.

Has something in the EUtils API changed? No clue in the chapter 4 release notes, but chapter 6 notes for April sound suggestive: https://www.ncbi.nlm.nih.gov/books/NBK179288/#chapter6.Release_Notes

Try not loading 'dash', just in case. No effect.

jshoyer commented 6 years ago

pub-med --> pub-xml-parse-PubMedArticle(Set) --> xml-get-children yields too many arguments for decode-coding-string. Italics tag gets turned into a child node, causing problems.

(defun pub-xml-parse-PubMedArticle (article) "Parses article XML"
  (let* ((citationEntry (car (xml-get-children currArticle 'MedlineCitation)))
     ...
     (title (car (xml-node-children (car (xml-get-children articleEntry 'ArticleTitle)))))
     ...
     (hash)
  )
  ...
  (if title (setq title (decode-coding-string title 'mule-utf-8)))
  ...
)
jshoyer commented 6 years ago

Option 1: Strip out italics tags before pub-xml-parsing-PubMedArticle or -PubMedArticleSet

Option 2: Collapse undesired trees back into a single string.

jshoyer commented 5 years ago

Using a regular expression in the efetch results to remove all <i></i> and <b></b> tags is working well enough for now.