Closed qak closed 4 years ago
I'm not entirely sure what to make of the docs you point out, but before I put Place
anywhere I need to know what it means inside Zotero. What does Place
mean for a Zotero Conference Paper
if not Event Place
?
I'm not a great fan of fields being put together with hard-wired separators. But if I do, I need to know what the semantics of Place
are.
Zotero's location fields are even more elaborate than I had originally thought. The documentation describes the relevant fields as follows:
Place
is used for the place where the conference proceedings was published. If separate locations are needed for the publication place and the location of the conference, this field should be left blank and the Event Place
and Publisher Place
fields should be used instead.
Event Place
records the geographic location the conference.
Publisher Place
records the geographic location of the publisher.
There's also the Original Publisher Place
field for 'The geographic location of the publisher of the original version of an item (e.g., the untranslated version).'. I think we can get away with not considering this field in this issue.
Back to the quote from BibTeX's documentation:
The
PROCEEDINGS
andINPROCEEDINGS
entry types now use theaddress
field to tell where a conference was held, rather than to give the address of the publisher or organization. If you want to include the publisher’s or organization’s address, put it in thepublisher
ororganization
field.
It seems that for conference papers (inproceedings
entries) BibTeX originally only supported recording the publisher's location in the address
field. It seems that the developers later decided it would be better to support recording both the publisher's and the conference's location. The above quote from the documentation implies that conference papers (inproceedings
entries) should instead record the conference's location in the address
field and the publisher's location should be put in the publisher
field. What is exactly meant by putting the publisher's location into the publisher
field isn't further discussed, but it probably partially depends on the used bibliography style. I think I went too far in my initial comment by saying that the publisher's location should be appended to the publisher
field separated with a comma. Assigning any of 'Publisher, Publisher Place'
, 'Publisher; Publisher Place'
, 'Publisher (Publisher Place)'
, 'Publisher Place: Publisher'
to the BibTeX publisher
field seem like valid options depending on the used bibliography style. The fact that BibTeX's documentation states that that the publisher's location can be placed into the publisher
field is also discussed on Stack Exchange.
Maybe the best first step would be to make this functionality available from BBT's scripting interface. I tried to implement the described functionality using a postscript, but it seems that the values of Event Place
and Publisher Place
aren't made available the way I need. It looks like BBT currently assigns the values of fields Place
, Event Place
and Publisher Place
all to the same variable item.place
(so the interface only exposes the last value that got assigned). I had in mind something like
if (Translator.BetterBibTeX && item.itemType === 'conferencePaper' && item.eventPlace && item.publisher && item.publisherPlace) {
reference.add({name: 'address', value: item.eventPlace})
reference.add({name: 'publisher', value: item.publisher + ', ' + publisherPlace})
}
whilst adjusting the call to reference.add
depending on what looks the best in the current bibliography style. Here's the debug-report ID with an updated conference paper item: 887N52WF-euc
.
Sorry I was gone so long -- Zotero 5.0.85 introduced some changes with required a bit of work on my end to address, but that's now done.
Where in the UI do you find Event Place
and Publisher Place
?
Can you send me a new debug log? I forgot to pick up 887N52WF-euc
and the log submissions get deleted after a week.
Sorry I was gone so long -- Zotero 5.0.85 introduced some changes with required a bit of work on my end to address, but that's now done.
No problem, thanks for working on BBT =)
Where in the UI do you find
Event Place
andPublisher Place
?
They don't have their own fields but they're entered into Zotero's Extra field.
Can you send me a new debug log? I forgot to pick up
887N52WF-euc
and the log submissions get deleted after a week.
Here it is: CTCKDJ2V-euc
.
:robot: this is your friendly neighborhood build bot announcing test build 5.2.20.6294 ("adjust tests for #1471")
Install in Zotero by downloading test build 5.2.20.6294, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".
Sponsor:
is actually a stanza that has special meaning in the extra
field -- it's recognized as "cheater syntax" and removed (by me) from the extra field, I just stuck it in collaborator
because I didn't know about what sponsors
meant for bibtex, but according to BibTeXing:
organization The organization that sponsors a conference or that publishes a manual.
So 6294 should get that right without requiring a postscript.
So 6294 should get that right without requiring a postscript.
Thanks, I've tried it and Sponsor
is correctly assigned to organization
without needing a postscript.
~Should the issue be otherwise closed though? What about the handling of Place
, Event Place
and Publisher Place
which I mentioned previously?~
Edit: Oh, the issue gets automatically reopened.
Yep, if you comment on a closed issue, the bot reopens it as a reminder for me.
We can still talk about those other fields if you want.
I originally opened this issue because of how publishing addresses are handled in Zotero's Conference Paper
items. In the previous comments I've described how Zotero supports fields Place
, Event Place
and Publisher Place
. Using these fields from BBT isn't fully supported yet. I tried to implement the missing functionality using a postscript, but it seems that BBT treats the various place fields as identical. I've further described this in my previous comments here and here.
The issue you've fixed (assigning the value of Sponsor
to organization
) is in fact something that I, until you brought it up, haven't mentioned in any of my previous comments! That was a separate issue I worked around by using a postscript; I was planning to report the issue later on. I think that because one of my earlier comments references a postscript and because the debug-report contained another unrelated postscript (which was used to deal with Sponsor
), you continued with the Sponsor
problem and forgot that this issue addresses publishing locations instead :).
Sorry about that. Your extra
field has:
extra | available as |
---|---|
Event Date: 2015-06-08/2015-06-11 | item.extra |
Event Place: Newport Beach, California, USA | address |
Publisher Place: New York, NY, United States | item.extra |
Sponsor: ACM SIGARCH | organization |
Yes, and as I mentioned previously, I couldn't seem to access the values of Event Place
and Publisher Place
separately because BBT seems to write both of these fields to the same variable accessible through the scripting API. This means that the value of the variable depended on the order of Event Place
and Publisher Place
in Zotero's Extra
field.
I'm currently following the Zotero rules for parsing the extra field (to the best of my abilities), and it turns out the place discussion is a long-standing problem. I'll see how I can deviate from that without running into problems when that gets fixed.
I have a concept for a solution, but it takes a lot longer than I had hoped to implement this.
@retorquere Did you figure out which types should have Place
mapped to event-place and which to publisher-place?
I think I'm going to split the behavior between mapping for CSL and for other Zotero concerns. For "Place", which doesn't have an unambiguous mapping to CSL, I'm considering leaving it in the extra fields where it can be picked up by a postscript. If you want event-place
you could use that and it would map unambiguously to Place
for CSL->Zotero mappings.
Figuring out a consistent mapping has been a challenge however. I feel like I'm closing in, but I thought that a week ago too, and I spent most of my waking hours last week to try to get it done.
The current Zotero behavior is to map Place to both event-place and publisher-place. Doing that would make pandoc citations mirror Zotero Word processor citations. “Event place” and “Publisher place” obviously should just map to the one.
“Event place” and “Publisher place” obviously should just map to the one.
That's not what @qak is looking for though. The problem is exactly that these two map to one field, where he wants to distinguish between them. 1-to-many is less problematic than many-to-one.
No, the problem is that Place (the Zotero field) is currently only getting mapped to one, but should be mapped to both to match Zotero's (problematic) behavior. The CSL variables Event Place (event-place
) and Publisher Place (publisher-place
) should only be mapped to the actual CSL variable.
Right, I'm testing that setup right now.
All "Place" fields in Zotero are mapped to both event-place
and publisher-place
because that field predates Zotero adopting CSL, and there was only one "Place" field for all item types. It's currently not possible for Zotero to map "Place" to one or the other across different items, hence the annoying one-to-many mapping.
I have a deterministic mapping derived from the schemas.
Still -- that deterministic mapping leaves me with a bit of a problem. If I'm exporting to CSL, and there's place
in extra
, if I copy that to both event-place
and publisher-place
, a postscript can't distinguish between there having been place:
in the extra
field, or two distinct event place:
and publisher place:
lines. Postscripters would have to check whether the values happen to be the same. Ugh. That's what I'm going to do though. I've spent more than enough time on this.
Postscripters would have to check whether the values happen to be the same.
I think that this limitation shouldn't be much of a problem. The ability to programmatically extract both Event Place
and Publisher Place
would resolve the issue in any case. Thanks for taking the time to look into this!
I'm finally in the final stages of running my tests, and most of it is looking good, but I'm now hitting an (old) issue that I don't quite know what to do with.
A visualization of the current mapping can be found here. I'm using yEd top open it. Quick rundown:
extra
field (they're lower case in the graph, but casing doesn't matter when you enter them in the extra
field).place
: both the event place
and publisher place
keywords would "write" to the place
zotero field, meaning potential data loss. event-place
and publisher-place
would remain available to postscript, but in an export to zotero fields (which fuels the bibtex and biblatex output), they'd both not show up unless explicitly acted upon (which bblt does in places)place
label which can safely write to the CSL fields event-place
and publisher-place
. The numbers besides the arrows show what route led to the inference.I know this looks a little complicated but this was the easiest way to visualize the rats nest of field mappings.
I'm still not really happy with mappings like place
for CSL. I have a sample with place: <something>
in the extra
field, and it looks weird to me to have that show up in both the event-place
and the publisher-place
field.
Another option for point 5 is to allow overwrites (with potentially arbitrary outcomes) but leave the extra fields available for postscripts to correct the situation. I'm no great fan of "arbitrary" though.
I had totally forgotten how much fun it was to work with graphs.
The current Zotero behavior is to map Place to both event-place and publisher-place. Doing that would make pandoc citations mirror Zotero Word processor citations. “Event place” and “Publisher place” obviously should just map to the one.
That's what it does now, but if at all possible, now's the chance to not just implement the existng workaround. It is, after all, supposedly better CSL JSON.
I'm hitting an issue now about container
. container
is not a currently mapped field, so I have to make a choice myself. The CSL var spec says its type is date
, with a less than helpful description of ?
.
No one knows what container
is supposed to be used for, and it is marked for deprecation. I honestly would just ignore it.
If you want, I can put together a suggestion for which place field Place should go to for each Zotero type. Mostly it would be publisher-place
except for Presentation (CSL speech
). Conference Paper (CSL paper-conference
) should still be publisher-place
because that field is intended for the place of the publisher of the published proceedings, not the location of the conference (cf. the Proceedings Title vs the Conference Name fields).
Oof... per-item-type mapping should be technically possible but I'd have to rethink the architectore for the mapping I have. Hmm, I can maybe add tags to the graph... let's give it a go and see how complicated things would get. Can't commit to it though.
Organized by Zotero type or CSL type?
Errrr.... conceptually it'd be best to do this by csl type I think?
And I think I have an idea on how I could do this... hmm...
Regarding (5) above, this is a very niche issue. In most cases, users will be entering place information into the proper Zotero Place field. If something is in Extra, it is usually to force Zotero to only map to either event-place
or publisher-place
or to provide separate values for the two (e.g., to give a publisher location and an event location for paper-conference
).
For this latter case, I'm not sure how a flow that first pushes into Zotero fields, then back out into BibTeX fields would work. Could the place fields be directly mapped to their correct Bib(La)TeX fields, rather than first into the Zotero schema?
There is also the issue that BibTeX's usage of fields is bizarre here. In my experience, styles that include the place for a published proceedings item want the publisher location, not the conference location, so BibTeX's inconsistent use of address
across types is a problem (cf. here). BibLaTeX is obviously better here with separate location
and venue
fields for the publisher and event locations, respectively.
All that said, here is what a generic Place field should map to for each CSL/CSLm type. Basically, everything should map to publisher-place
except speech
. Some types, such as interview
, hearing
, personal_communication
, and paper-conference
might be expected to have both types of places, but the primary one used in citations would be publisher-place
.
For this latter case, I'm not sure how a flow that first pushes into Zotero fields, then back out into BibTeX fields would work. Could the place fields be directly mapped to their correct Bib(La)TeX fields, rather than first into the Zotero schema?
Yep. That is what I now do; when extracting for bibtex I extract variables "zotero-oriented", but do not write event-place
to place
because that could get data loss. event-place
and things like original-date
, which doesn't have a zotero equivalent at all, stay in csl format, available to the BBT translators, and I decide on them in code.
All that said, here is what a generic Place field should map to for each CSL/CSLm type. ... Basically, everything should go to
publisher-place
exceptspeech
. Some types, such asinterview
,hearing
,personal_communication
, andpaper-conference
might be expected to have both types of places, but the primary one used in citations would bepublisher-place
.
That's just for place
though. There's more multiple-mappings; dimensions
, container title
and references
for example. Try looking at the graph, it's pretty.
(I'm not really following the numbers next to the dashed lines. How do I know what "13" means?)
Are the collisions you are worried about if the user supplies a value in Extra but there is already a value for the variable in a proper Zotero field? If that's the case, these are the resolution rules used by citeproc-js:
In terms of the specific gray collision lines in your graph, most of them aren't a problem I don't think. These are just type-specific "localizations" or labels for the generic term. Multiple of these variables in a class don't occur within a single item type. That these don't all collapse to the same internal Zotero database variable reflects its history of the database structure coming before the CSL adoption (or else, runningTime and artworkSize would be internally mapped to a common dimensions variable in the same way bookTitle is mapped to publicationTitle).
dimensions
you shouldn't ever encounter a collision because no Zotero type has both runningTime and artworkSize. Those are just the type-specific "localizations" of the general dimensions
variablereferences
. history and references are just the type-specific "localizations" of the references
variable for patent (references) and other legal types (history)container-title
. code and reporter are just the type-specific "localizations" for publicationTitle for cases and legislation.authority
. These are just the type-specific wordings of the general authority
variable.volume
. codeNumber is just a specific wording of volume
for legislationcall-number
. applicationNumber is just the wording for patent. You shouldn't ever have a collision for those.medium
. system is a type-specific label for medium for computerProgram items (though this is a pretty stupid mapping in my opinion that I've recommended be dropped).In contrast,
collection-title
.The "last encountered" behavior is also what Zotero CSL JSON and Better CSL JSON currently do if both series and seriesTitle are supplied in proper fields.
:robot: this is your friendly neighborhood build bot announcing test build 5.2.22.6531 ("test cases for new mapping")
Install in Zotero by downloading test build 5.2.22.6531, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".
(I'm not really following the numbers next to the dashed lines. How do I know what "13" means?)
The numbers don't mean anything, it's just that edges that have the same number do something together. In the case of a grey-dashed edge, there will be black edges with the same number that explain why it's removed.
Are the collisions you are worried about if the user supplies a value in Extra but there is already a value for the variable in a proper Zotero field? If that's the case, these are the resolution rules used by citeproc-js:
* name variables can have multiple entries
I have that
* For other variables, the last encountered value in Extra wins
I have that too
* Values in Extra don't override proper Zotero fields except: * date variables * item type
I'll have to change this. This surprises me. Wouldn't something entered in the extra
field be considered to be more deliberate by the user?
1. `dimensions` you shouldn't ever encounter a collision because no Zotero type has both runningTime and artworkSize. Those are just the type-specific "localizations" of the general `dimensions` variable
But a user could enter both in the extra
field. This goes for all these.
In contrast,
1. series and seriesTitle is a bit of a mess. Even the Zotero variables series, seriesTitle, and seriesText are a jumble with unclear distinction. Here, seriesTitle is the rarely used variable, so if there is a collision, I would suggest prioritizing series in the map to `collection-title`.
Alright, that's doable I think.
dimensions you shouldn't ever encounter a collision because no Zotero type has both runningTime and artworkSize. Those are just the type-specific "localizations" of the general dimensions variable
In such a case I would have expected a baseField
mapping to exist for those. If that existed, all of these double mappings would probably disappear. But since there isn't, if you change item type from film to artwork, runningTime
is lost. Which isn't too strange -- a runningTime
of 2 hours doesn't have a sensible translation to a artworkSize
for a painting.
I can add a mapping-specific baseField-like mapping for these fields. That may well resolve the lot.
I don't really care for users entering Zotero labels rather than CSL variable names in Extra, but I get that's a possibility. Still though, I think the type-specificity of most of these labels makes collisions a rare possibility. If they do occur, I think the last-encountered rule is a reasonable behavior.
Regarding Zotero fields vs Extra getting priority, I think there were two arguments. First, Frank was leery about being too aggressive with the Extra "cheater" syntax. The date and type overrides are there really to overcome major limitations of the Zotero object model (missing CSL types and a relatively inflexible date parser). Second, Extra will get preserved if an item is duplicated, has the type changed, etc., so there is reasonable possibility of bad data in Extra that might escape users' notice more than proper fields.
I think if Zotero's object model were built today, it would have baseField mappings for many of these, but changing item types and fields hasn't been possible until recently because of the syncing architecture.
I'll just do the faux basefields and the conservative read from extra and see where that gets me. If that works, it would keep things simpler.
I think the only things that would remain to be addressed in that case is the handling of Place and series/seriesTitle.
I don't really care for users entering Zotero labels rather than CSL variable names in Extra, but I get that's a possibility.
The graph hides those for readability, but what happens is if there is a [a-zA-Z_-]+: .*
line in the extra, I pick up the part before the :
and transform it using
label.replace(/[-_]/g, ' ').replace(/([a-z])([A-Z])/g, '$1 $2').toLowerCase()
and then test whether it's any of those white labels. This means event-place
and event place
(and EvEnt-PlaCe
, but we obviously would not encourage this) all end up as event place
in the matching process. Inside the translators, these would always show up as event-place
in the parsed extra fields. They wouldn't automatically show up in non-CSL-based translators, but there I pick them out individually as I write out fields from Zotero or "extra-CSL" to bib(la)tex.
Zotero labels get the same treatment so you can enter either. AAMOF the old-style cheater syntax {:publicationTitle:stuff}
will also work, although I wouldn't expect people to use that.
Wait, the faux-basefields don't solve anything. if I find dimensions
in the extra
field I will just treat those as both artworkSize
and runningTime
, and they're not even written out now, so that's not at all interesting right now. More interesting is call number
. I do write out both callNumber
and applicationNumber
and they mean different things. And right now, you can't say "no, just fill callNumber
, not applicationNumber
" (where you can say "fill applicationNumber
but not callNumber
.
Wait, that's an error anyhow. If there's a direct edge between a label and a domain (zotero/csl) var, it should not also infer a longer route to a var in the same domain. That's just an error.
BBT's BibTeX exporter doesn't seem to handle the
Place
field of Zotero items of typeConference Paper
according to BibTeX's documentation. The documentation says:The BibTeX exporter correctly maps Zotero's
Event Place
field to BibTeX'saddress
field, but it discards Zotero'sPlace
field. The documentation suggests that BBT should append the contents of Zotero'sPlace
field to BibTeX'spublisher
field. BBT can't do the same with theorganization
field because Zotero doesn't support storing conference organisers.The documentation doesn't make it clear what separator should be used between the publisher and address though. An obvious choice would be a comma, but this may not look the best in some BibTeX styles (for example when a semicolon is used to separate the entries surrounding the publisher). I still think that it'd be better for BBT to include the publisher's address with a possibly suboptimal separator rather than not including the information at all. Manually editing the the exported BibTeX file to replace commas with another separator should be quicker than manually adding the addresses of publishers.
This is the debug-report ID of an example conference paper item:
QRN56KSA-euc
. I haven't tested this with proceedings items and it doesn't even seem to be obvious to me how to add such an entry to Zotero. Please let me remind that this issue only pertains to the BibTeX exporter.