Closed Ceasea closed 11 months ago
The PDF standard defines 2 ways to set meta-data on a PDF document. And in the case of your PDF, both of these are used, and they are not in sync.
This is the /Info
dictionary of your PDF (I reformatted it a bit for clarity):
611 0 obj
<< /Creator (Elsevier)
/CrossMarkDomains#5B1#5D ([elsevier.com](http://elsevier.com/))
/CrossmarkMajorVersionDate (2010-04-23)
/CreationDate--Text (4th December 2022)
/ElsevierWebPDFSpecifications (7.0)
/robots (noindex)
/ModDate (D:20221204102501Z)
/Author (Xiaofan Du)
/doi (10.1016/j.rinp.2022.106094) /Title (þÿ I n - s i t u s y n t h e s i s a n d c h e m i c a l b o n d i n g o f t h e A l - d o p e d ² - S i C p a r t i c l e s i n A l - S i - C l i g h t a l l o y s)
/Keywords (SiC crystal structure,Aluminum doped,Chemical bonding,First-principles calculations,Mechanical properties) /CreationDate (D:20221204102326Z)
/Producer (Acrobat Distiller 8.1.0 \(Windows\))
/Subject (Results in Physics, 43 \(2022\) 106094. doi:10.1016/j.rinp.2022.106094)
/CrossMarkDomains#5B2#5D ([sciencedirect.com](http://sciencedirect.com/))
/CrossmarkDomainExclusive (true)
>>
This /Info
dictionary lists 1 author:
/Author (Xiaofan Du)
The second meta-data information carrier is so called XMP (eXtensible Metadata Platform):
614 0 obj
<< /Length 5676 /Subtype /XML /Type /Metadata
>>
stream
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.1.2">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:ali="http://www.niso.org/schemas/ali/1.0/">
<ali:license_ref>
<rdf:Bag>
<rdf:li rdf:parseType="Resource">
<ali:uri>http://creativecommons.org/licenses/by-nc-nd/4.0/</ali:uri>
</rdf:li>
</rdf:Bag>
</ali:license_ref>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:crossmark="http://crossref.org/crossmark/1.0/">
<crossmark:CrossMarkDomains>
<rdf:Seq>
<rdf:li>[elsevier.com](http://elsevier.com/)</rdf:li>
<rdf:li>[sciencedirect.com](http://sciencedirect.com/)</rdf:li>
</rdf:Seq>
</crossmark:CrossMarkDomains>
<crossmark:CrossmarkDomainExclusive>true</crossmark:CrossmarkDomainExclusive>
<crossmark:DOI>10.1016/j.rinp.2022.106094</crossmark:DOI>
<crossmark:MajorVersionDate>2010-04-23</crossmark:MajorVersionDate>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:format>application/pdf</dc:format>
<dc:identifier>10.1016/j.rinp.2022.106094</dc:identifier>
<dc:publisher>
<rdf:Bag>
<rdf:li>Elsevier B.V.</rdf:li>
</rdf:Bag>
</dc:publisher>
<dc:description>
<rdf:Alt>
<rdf:li xml:lang="x-default">Results in Physics, 43 (2022) 106094. doi:10.1016/j.rinp.2022.106094</rdf:li>
</rdf:Alt>
</dc:description>
<dc:subject>
<rdf:Bag>
<rdf:li>SiC crystal structure</rdf:li>
<rdf:li>Aluminum doped</rdf:li>
<rdf:li>Chemical bonding</rdf:li>
<rdf:li>First-principles calculations</rdf:li>
<rdf:li>Mechanical properties</rdf:li>
</rdf:Bag>
</dc:subject>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">In-situ synthesis and chemical bonding of the Al-doped β-SiC particles in Al-Si-C light alloys</rdf:li>
</rdf:Alt>
</dc:title>
<dc:creator>
<rdf:Seq>
<rdf:li>Xiaofan Du</rdf:li>
<rdf:li>Zhao Qian</rdf:li>
<rdf:li>Xiangfa Liu</rdf:li>
</rdf:Seq>
</dc:creator>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:jav="http://www.niso.org/schemas/jav/1.0/">
<jav:journal_article_version>VoR</jav:journal_article_version>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
<pdf:CreationDate--Text>4th December 2022</pdf:CreationDate--Text>
<pdf:Producer>Acrobat Distiller 8.1.0 (Windows)</pdf:Producer>
<pdf:Keywords>SiC crystal structure,Aluminum doped,Chemical bonding,First-principles calculations,Mechanical properties</pdf:Keywords>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
<pdfx:CreationDate--Text>4th December 2022</pdfx:CreationDate--Text>
<pdfx:CrossMarkDomains>
<rdf:Seq>
<rdf:li>[sciencedirect.com](http://sciencedirect.com/)</rdf:li>
<rdf:li>[elsevier.com](http://elsevier.com/)</rdf:li>
</rdf:Seq>
</pdfx:CrossMarkDomains>
<pdfx:CrossmarkDomainExclusive>true</pdfx:CrossmarkDomainExclusive>
<pdfx:CrossmarkMajorVersionDate>2010-04-23</pdfx:CrossmarkMajorVersionDate>
<ZlkjsyMiJnMmKoweGz9z8ysNNywmPlt6OowmGzdaLyPmKowz-ndn.o9ePot6Pnd6SmtuTma/>
<pdfx:doi>10.1016/j.rinp.2022.106094</pdfx:doi>
<pdfx:robots>noindex</pdfx:robots>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:prism="http://prismstandard.org/namespaces/basic/3.0/">
<prism:aggregationType>journal</prism:aggregationType>
<prism:copyright>© 2022 The Author(s). Published by Elsevier B.V.</prism:copyright>
<prism:coverDate>2022-12-01</prism:coverDate>
<prism:coverDisplayDate>1 December 2022</prism:coverDisplayDate>
<prism:doi>10.1016/j.rinp.2022.106094</prism:doi>
<prism:issn>2211-3797</prism:issn>
<prism:pageRange>106094</prism:pageRange>
<prism:publicationName>Results in Physics</prism:publicationName>
<prism:startingPage>106094</prism:startingPage>
<prism:url>https://doi.org/10.1016/j.rinp.2022.106094</prism:url>
<prism:volume>43</prism:volume>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xmp="http://ns.adobe.com/xap/1.0/">
<xmp:CreateDate>2022-12-04T10:23:26</xmp:CreateDate>
<xmp:CreatorTool>Elsevier</xmp:CreatorTool>
<xmp:MetadataDate>2022-12-04T10:25:01</xmp:MetadataDate>
<xmp:ModifyDate>2022-12-04T10:25:01</xmp:ModifyDate>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
<xmpMM:DocumentID>uuid:1b499bed-4ce8-4c0c-b682-0d58baae1cbe</xmpMM:DocumentID>
<xmpMM:InstanceID>uuid:b6ed5266-03cb-4855-90c0-636283357f42</xmpMM:InstanceID>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xmpRights="http://ns.adobe.com/xap/1.0/rights/">
<xmpRights:Marked>True</xmpRights:Marked>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
endstream
endobj
Here we have 3 authors listed:
<dc:creator>
<rdf:Seq>
<rdf:li>Xiaofan Du</rdf:li>
<rdf:li>Zhao Qian</rdf:li>
<rdf:li>Xiangfa Liu</rdf:li>
</rdf:Seq>
</dc:creator>
You can extract XMP meta-information using borb
by the way.
Check the examples
In short, your PDF is kind of "broken". It contains conflicting information with regards to the author. Which means libraries will either give you one, or the other.
Hi, thansk for the explanantion.
I have tried to extract XMP meta information using borb as you suggested.
The result remains the same. @jorisschellekens
def test():
import typing
from borb.pdf import Document
from borb.pdf import PDF
doc: typing.Optional[Document] = None
filename = '1-s2.0-S2211379722007082-main.pdf'
with open(filename, 'rb') as f:
doc = PDF.loads(f)
print(" id %s " % doc.get_xmp_document_info().get_document_id())
print(" authors %s " % doc.get_xmp_document_info().get_author())
print(" creator %s " % doc.get_xmp_document_info().get_creator())
test()
output:
id uuid:1b499bed-4ce8-4c0c-b682-0d58baae1cbe
authors Xiaofan Du
creator None
python version: 3.10 borb version: 2.1.18
Hi, thanks for the excellent lib.
I currently extract metadata in PDF. I found I can extract one author but the pdf file lists three.
I also have tried other libs (no offense), the extracted results are the same, only one author name.
I know this may be not related the lib but the pdf file. However, I've checked the pdf's properties, which does list the three authors.
This problem really confuses me. I uploaded the pdf file and I hope I can get some advice from you. Thank you very much.
1-s2.0-S2211379722007082-main.pdf