urschrei / pyzotero

Pyzotero: a Python client for the Zotero API
https://pyzotero.readthedocs.org
Other
895 stars 98 forks source link

items retrieved with format=bibtex broken #77

Closed marconioz closed 6 years ago

marconioz commented 6 years ago

General

Platform: Windows 7 Python version: 2.7.8 Pyzotero version: 1.2.15 apiversion 3

Problem description

bitems = zot.everything(zot.top(format='bibtex')); zitems = zot.everything(zot.top());

I read a bbl file, look for matching entries (using citekey) in the list of items retrieved (bibtex formated). Then I get the Zotero key of that item (when and if those two lists matched) and copy those items to the new collection. Would work, IF those lists matched (i.e. were in the same order).

But worse, the bibtex list is broken. One of the references had a \n in the middle of the abstract. This completely derail the bibtex formated list. A single item now has many corresponding bibtex items (broken, with only some fields, one just have the rest of paragraph after the \n)

Thansks, M.

urschrei commented 6 years ago

Hi Marco, First, Pyzotero doesn't alter the order of retrieved items (I believe the API returns them in descending order, according to the date and time at which they were added to the library or collection) – if you want a specific order, you'll have to specify it using the sortand direction keywords, or implement that yourself by sorting the list items.

As to the second problem, I'm not sure what you mean without seeing an example item, but again, Pyzotero doesn't alter the data that's retrieved from the API – if there are line break characters in the retrieved data, they're almost certainly present in the data on zotero.org.

marconioz commented 6 years ago

Hello there Stephan

I guess I didn't explained this very well.

I don't need them in any specific order. I just want both lists (formatted with bibtex or not) to be in the same order. At the moment, one list is larger than the other, so my problem is even worse.

The line break is on zotero.org. It is on the abstract of an item. When I use pyzotero to collect all my refs, I get two lists of different sizes. The one with bibtex format is longer and broken (some items are not valid bibtex items).

I think it is a problem with the API if I understand how the whole thing works, because when I export said item using zotero bibtex converter on my desktop, it does break the line (start a new line in the middle of an item), but does not break the bibtex syntax.

Somehow, when the api makes its call to zotero and try to format in bibtex, it split an item when it gets to this new line (\n) which was in the abstract field. Really bad.

I have no idea how the \n got into zotero dabase, but I only add stuff to my bib with the chrome plugin.

Here is an item where this happened(not sure if gmail will format this well...) :

@article{victor_spatial_1991, title = {Spatial organization of nonlinear interactions in form perception}, volume = {31}, issn = {0042-6989}, url = {http://www.sciencedirect.com/science/article/pii/004269899190125O}, doi = {10.1016/0042-6989(91)90125-O}, abstract = {We examined the perception of structure in a family of visual textures whose second-order correlation structure is flat. These textures were generated by two-dimensional recursion rules, in a manner which extends the construction of Julesz, Gilbert and Victor (1978; Biological Cybernetics, 31, 137–140). Textures generated by some recursion rules elicited a visually salient percept of structure, while textures generated by other recursion rules did not. Textures whose statistical structure was visually salient produced evoked responses which differed from the response evoked by completely random textures. The size of this VEP difference correlated well with psychophysical measures.

Since the textures were constructed to have identical global spatial frequency spectra, models for the extraction of visual structure must be essentially nonlinear. Models based on symmetry, information content, or simple spatial extent (but not pattern) of correlation fail to explain the observed results. Models based on the cooperative interaction of pairs of nonlinear subunits provide a reasonable qualitative account of the findings. The critical model features are (i) the presence of multiple nonlinear subunits, and (ii) a second nonlinearity, such as a threshold, at the stage of combination of subunit signals.}, number = {9}, urldate = {2012-05-17}, journal = {Vision Research}, author = {Victor, Jonathan D. and Conte, Mary M.}, year = {1991}, keywords = {Human, Visual textures, Modeling, Nonlinear interactions, Visual evoked potentials}, pages = {1457--1488}, }

Cheers, M.

On 20 November 2017 at 21:38, Stephan Hügel notifications@github.com wrote:

Hi Marco, First, Pyzotero doesn't alter the order of retrieved items (I believe the API returns them in descending order, according to the date and time at which they were added to the library or collection) – if you want a specific order, you'll have to implement that yourself by sorting the list items.

As to the second problem, I'm not sure what you mean without seeing an example item, but again, Pyzotero doesn't alter the data that's retrieved from the API – if there are line break characters in the retrieved data, they're almost certainly present in the data on zotero.org.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/urschrei/pyzotero/issues/77#issuecomment-345657146, or mute the thread https://github.com/notifications/unsubscribe-auth/ABeIBYoETZU6aoQyp_QmIWHlVnJudZS9ks5s4VawgaJpZM4QjsyA .

urschrei commented 6 years ago

Marco, There are actually two problems here, one is my fault, and one is due to lacking documentation about the Zotero API:

  1. I carelessly fixed an issue related to bibtex items earlier this year, which caused the bibtex responses to be stripped of their formatting information – \t and \n sequences, in particular. I've just pushed version 1.3.0 to PyPI. It includes a major change: if you specify the return format as bibtex, you receive a bibtexparser object. It has an entries property, which is simply a list of properly-formatted BibTeX entries. The fields should be identically-formatted to entries you get back from a zot.top() call, including line break and tab characters, so you should be able to compare fields without too much trouble.
  2. When you specify bibtex, the Zotero API won't return any top-level items if they're note or attachment types. This accounts for the different list lengths. You should be able to filter the larger list quite easily:

filtered_items = [item for item in zitems if item['data']['itemType'] != 'note' and item['data']['itemType'] != 'attachment']

marconioz commented 6 years ago

Hi Stephan,

Yep, all good now! Thanks for working on this. It makes zotero so much more useful!

Cheers, M.

On 21 November 2017 at 04:12, Stephan Hügel notifications@github.com wrote:

Marco, There are actually two problems here, one is my fault, and one is due to lacking documentation about the Zotero API:

  1. I carelessly fixed an issue related to bibtex items earlier this year, which caused the bib text responses to be stripped of their formatting information – \t and \n sequences, in particular. I've just pushed version 1.3.0 to PyPI. It includes a major change: if you specify the return format as bibtex, you receive a bibtexparser https://bibtexparser.readthedocs.io/en/v0.6.2/bibtexparser.html#bibdatabase.BibDatabase.entries object. It has an entries property, which is simply a list of properly-formatted BibTeX entries. The fields should be identically-formatted to entries you get back from a zot.top() call, including line break and tab characters, so you should be able to compare fields without too much trouble.
  2. When you specify bibtex, the Zotero API won't return any top-level items if they're note or attachment types. This accounts for the different list lengths. You should be able to filter the larger list quite easily:

filtered_items = [item for item in zitems if item['data']['itemType'] != 'note' and item['data']['itemType'] != 'attachment']

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/urschrei/pyzotero/issues/77#issuecomment-345761606, or mute the thread https://github.com/notifications/unsubscribe-auth/ABeIBYUUJGawAh2PzeLiw-z1qBWl6VTYks5s4bL8gaJpZM4QjsyA .