Open peetucket opened 5 years ago
Current values in pubmed source records:
total = PubmedSourceRecord.count
pub_types=Hash.new(0)
n = 0
PubmedSourceRecord.find_each do |pmsr|
n += 1
pub_doc = Nokogiri::XML(pmsr.source_data)
begin
article_type = pub_doc.xpath('//PubmedArticle/MedlineCitation/Article/PublicationTypeList/PublicationType')[0].children[0].text
rescue
article_type = "NODE_NOT_FOUND"
end
pub_types[article_type] += 1
puts "#{n} of #{total} : #{article_type}"
end;nil
puts total
=> 423676
puts pub_types.sort_by {|_key, value| - value}.to_h
=> {"Journal Article"=>333665,
"Comparative Study"=>23977,
"Case Reports"=>19433,
"Clinical Trial"=>8464,
"JOURNAL ARTICLE"=>5703,
"Comment"=>5322,
"Letter"=>4609,
"Editorial"=>4526,
"English Abstract"=>3698,
"Evaluation Studies"=>3346,
"In Vitro"=>2361,
"Historical Article"=>982,
"Clinical Trial, Phase II"=>928,
"Clinical Trial, Phase III"=>707,
"Biography"=>690,
"Clinical Trial, Phase I"=>651,
"NODE_NOT_FOUND"=>612,
"Consensus Development Conference"=>473,
"News"=>468,
"Published Erratum"=>416,
"Congresses"=>375,
"Review"=>325,
"Controlled Clinical Trial"=>287,
"Guideline"=>277,
"Introductory Journal Article"=>237,
"Congress"=>176,
"Interview"=>143,
"REVIEW"=>135,
"LETTER"=>46,
"Clinical Study"=>46,
"Autobiography"=>45,
"Clinical Trial, Phase IV"=>41,
"Lectures"=>38,
"Addresses"=>38,
"Consensus Development Conference, NIH"=>38,
"Bibliography"=>35,
"Address"=>33,
"EDITORIAL"=>32,
"Retraction of Publication"=>30,
"Clinical Conference"=>24,
"Dataset"=>23,
"Newspaper Article"=>22,
"Corrected and Republished Article"=>21,
"Research Support, Non-U.S. Gov't"=>16,
"Classical Article"=>15,
"Lecture"=>15,
"Clinical Trial, Veterinary"=>13,
"Research Support, N.I.H., Extramural"=>11,
"Duplicate Publication"=>11,
"Interactive Tutorial"=>11,
"Patient Education Handout"=>11,
"Legal Case"=>11,
"Directory"=>9,
"Clinical Trial Protocol"=>8,
"Research Support, U.S. Gov't, P.H.S."=>7,
"Equivalence Trial"=>6,
"Personal Narrative"=>5,
"Systematic Review"=>4,
"Practice Guideline"=>3,
"Research Support, U.S. Gov't, Non-P.H.S."=>3,
"Legislation"=>3,
"Festschrift"=>2,
"Meta-Analysis"=>2,
"Dictionary"=>2,
"Overall"=>2,
"PUBLISHED ERRATUM"=>2,
"Technical Report"=>2,
"Legal Cases"=>1,
"CASE REPORTS"=>1,
"Video-Audio Media"=>1,
"Adaptive Clinical Trial"=>1}
Pubmed Documented publication types:
It's unclear from this controlled vocabulary what we would map conference proceedings and books to though
Currently for Pubmed Source Records we set all publication types as
article
(see https://github.com/sul-dlss/sul_pub/blob/master/app/models/pubmed_source_record.rb#L140). It is possible the pubmed source records has information about the type that could be used to set it better as one of the following supported types:For example, the pubmed source XML has a node called that looks like this:
which suggests it may hold publication type.
e.g. in prod, see
puts PubmedSourceRecord.find_by(pmid:27397405).source_data