titipata / pubmed_parser

:clipboard: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset
http://titipata.github.io/pubmed_parser/
MIT License
584 stars 168 forks source link

Is there a reason why PubMed/MEDLINE extracted list elements are joined with ";" instead of keeping them as lists ? #107

Closed jtourille closed 5 months ago

jtourille commented 2 years ago

First, thank you for your tool, it is very very useful.

I was wondering why there is a difference in the processing of list elements according to the parser used. For instance:

Wouldn't it be more convenient to have a consistent way of handling list elements across parsers ? In that case, I would suggest to store list elements as lists instead of joining them with ;.

Also, I could do the changes if needed.

titipata commented 2 years ago

Hi @jtourille, thanks for the feedback. Yes, I totally agree!

I actually have an input author_list parameter here and here to parse authors as a list output instead of a concatenated string. This may not be the best way and happy to take suggestions from you.

Michael-E-Rose commented 5 months ago

Guess by now the parameter is sufficiently documented.