papis / papis-zotero

Zotero compatibility layer for papis
GNU General Public License v3.0
75 stars 17 forks source link

Problems with date #30

Closed mcepl closed 1 year ago

mcepl commented 1 year ago

I have plenty of records, where the date is just a year, for example this one:

    <bib:Book rdf:about="urn:isbn:978-80-7367-860-9">
        <dcterms:isReferencedBy rdf:resource="#item_368"/>
        <dcterms:isReferencedBy rdf:resource="#item_367"/>
        <dc:subject>20.-21. stol. \textbar2 czenas</dc:subject>
        <dc:subject>20.-21. stol. \textbar7 ch462155 \textbar2 czenas</dc:subject>
               <rdf:value>20.-21. stol. |2 czenas</rdf:value>
               <rdf:value>20.-21. stol. |7 ch462155 |2 czenas</rdf:value>
        <dc:subject>20th-21st centuries \textbar2 eczenas</dc:subject>
               <rdf:value>20th-21st centuries |2 eczenas</rdf:value>
        <dc:subject>ateismus \textbar7 ph118651</dc:subject>
               <rdf:value>ateismus |7 ph118651</rdf:value>
        <dc:subject>Bůh a člověk \textbar7 ph116953 \textbar2 czenas</dc:subject>
               <rdf:value>Bůh a člověk |7 ph116953 |2 czenas</rdf:value>
        <dc:subject>Catholic priests</dc:subject>
           <z:AutomaticTag><rdf:value>Halík, Tomáš</rdf:value></z:AutomaticTag>
        <dc:subject>intelektuálové \textbar7 ph121139</dc:subject>
               <rdf:value>intelektuálové |7 ph121139</rdf:value>
        <dc:subject>interviews \textbar2 eczenas</dc:subject>
               <rdf:value>interviews |2 eczenas</rdf:value>
        <dc:subject>katoličtí kněží \textbar7 ph114899</dc:subject>
               <rdf:value>katoličtí kněží |7 ph114899</rdf:value>
        <dc:subject>náboženská víra \textbar7 ph115476</dc:subject>
               <rdf:value>náboženská víra |7 ph115476</rdf:value>
        <dc:subject>názory a postoje \textbar7 ph137634 \textbar2 czenas</dc:subject>
               <rdf:value>názory a postoje |7 ph137634 |2 czenas</rdf:value>
        <dc:subject>pluralism (religion)</dc:subject>
        <dc:subject>pluralism (society)</dc:subject>
        <dc:subject>pluralismus (náboženství) \textbar7 ph281886</dc:subject>
               <rdf:value>pluralismus (náboženství) |7 ph281886</rdf:value>
        <dc:subject>pluralismus (společnost) \textbar7 ph135655</dc:subject>
               <rdf:value>pluralismus (společnost) |7 ph135655</rdf:value>
        <dc:subject>postmodern society \textbar2 eczenas</dc:subject>
               <rdf:value>postmodern society |2 eczenas</rdf:value>
        <dc:subject>postmoderní společnost \textbar7 ph124410 \textbar2 czenas</dc:subject>
               <rdf:value>postmoderní společnost |7 ph124410 |2 czenas</rdf:value>
        <dc:subject>relations between God and man \textbar2 eczenas</dc:subject>
               <rdf:value>relations between God and man |2 eczenas</rdf:value>
        <dc:subject>religious faith</dc:subject>
        <dc:subject>rozhovory \textbar7 fd133303 \textbar2 czenas</dc:subject>
               <rdf:value>rozhovory |7 fd133303 |2 czenas</rdf:value>
        <dc:subject>sekularizace \textbar7 ph125471</dc:subject>
               <rdf:value>sekularizace |7 ph125471</rdf:value>
        <dc:subject>views and attitudes \textbar2 eczenas</dc:subject>
               <rdf:value>views and attitudes |2 eczenas</rdf:value>
        <dc:title>Tomáš Halík: smířená různost: rozhovor</dc:title>
        <z:shortTitle>Tomáš Halík</z:shortTitle>
        <z:libraryCatalog> Library Catalog</z:libraryCatalog>
           <dcterms:LCC><rdf:value>272-726.3 |2 MRF</rdf:value></dcterms:LCC>
        <dc:identifier>ISBN 978-80-7367-860-9</dc:identifier>
        <prism:edition>Vyd. 1</prism:edition>
    <bib:Memo rdf:about="#item_368">
    <bib:Memo rdf:about="#item_367">
       <rdf:value>&lt;p&gt;Obsahuje rejstřík&lt;/p&gt;</rdf:value>

When trying to import I get this on the stderr:

[INFO] papis_zotero.sql: [   0/199 ] Exporting item 'A9AW7P8Z' with ref 'DeclarationDoRatzin2000' to folder '/home/matej/Dokumenty/clanky/bibliography/A9AW7P8Z'.
[ERROR] papis_zotero.sql: Failed to parse date.
  ┆ Traceback (most recent call last):
  ┆   File "/home/matej/.local/lib/python3.11/site-packages/papis_zotero/", line 63, in get_fields
  ┆     d = datetime.strptime(date.split(" ")[0][:-3], "%Y-%m")
  ┆         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ┆   File "/usr/lib64/python3.11/", line 568, in _strptime_datetime
  ┆     tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  ┆                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ┆   File "/usr/lib64/python3.11/", line 349, in _strptime
  ┆     raise ValueError("time data %r does not match format %r" %
  ┆ ValueError: time data '2011-00' does not match format '%Y-%m'
[DEBUG] bibtex: Generated ref 'Tomáš Halík: sm Halík, '.

Couldn’t the module somehow recognize that date matching the regular expression \d+ won’t be matched by the strptime format %Y-%m?

Using papis-zotero 0.1.2 from PyPI.

alexfikl commented 1 year ago

31 takes a stab at this using a regex like suggested. Can you give it a try to see if it works on your case?