Preprint item type - Githubissues

dstillman commented 2 years ago

@adam3smith, @bwiernik, is there anything I should be consulting for this? Anything this needs to be mapped to on the CSL side? I'm not seeing any existing issues for it.

adam3smith commented 2 years ago

Huh, surprised we don't have a ticket. Preprint should map to CSL article. Preprint server (not wedded to that label) should be publisher. I think we'll want series and series number to accommodate working papers in series. Beyond that, only standard fields.

Edit: just looking at arXiv and wondering if we should try to get the ID into a number field? It needs to be citeable

bwiernik commented 2 years ago

How about "repository" for publisher?

bwiernik commented 2 years ago

Type mapped to genre.

APA style wants the archive ID. We settled on CSL archive_location for that back when we discussed it @adam3smith when I was writing APA 7.

adam3smith commented 2 years ago

APA style wants the archive ID. We settled on CSL archive_location for that back when we discussed it @adam3smith when I was writing APA 7.

Do you remember why? number as used e.g. for patent, seems a better fit. I'm just a bit worried that we have a fair amount of styles citing archive and.location across all item types

bwiernik commented 2 years ago

Let me look into it

dstillman commented 2 years ago

The ids actually get a little tricky. We currently put arXiv IDs (from arXiv.org or Mendeley import) into Extra as arXiv (which maybe should've been arXiv ID), and I assumed we'd want to migrate that to a dedicated field, which later might be part of a more flexible many-to-one id system like we've talked about in the past. But then we'd probably need special logic everywhere to get that to the processor as number or whatever it needs to be — a regular CSL mapping wouldn't work because an import back from CSL-JSON would be ambiguous, with multiple possible fields (number, arXivID, or any other repo-specific ones).

Can we just assume that all preprint archives will use an unambiguous id format, with an identifiable prefix like arXiv:, and we can just store them in a single archiveID field, mapped bidirectionally to an appropriate CSL field? And any automated handling will just use the prefix to identify it?

adam3smith commented 2 years ago

I like the archiveID. Not sure if all servers have that - e.g. OSF preprints technically habe an ID but they never use it, but leaving the field empty is fine of course. Where IDs are essential, I think assuming a prefix and unique ID is plausible

bwiernik commented 2 years ago

Would we maybe want to add archive ID to all types alongside archive, location in archive, and the new archive place and archival collection? That would unambiguously separate physical and digital locations. CSL could add an archive_id variable

adam3smith commented 2 years ago

I like it, I think, particularly the electronic vs. physical but we should maybe run by some more people?

Sent from my phone

On Fri, Nov 12, 2021, 08:08 Brenton M. Wiernik @.***> wrote:

Would we maybe want to add archive ID to all types alongside archive, location in archive, and the new archive place and archival collection? That would unambiguously separate physical and digital locations. CSL could add an archive_id variable

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zotero/zotero-bits/issues/88#issuecomment-967105984, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA7PWXRBXSIAAAKORCVVRLULUGTLANCNFSM5H4NQJZQ .

dstillman commented 2 years ago

Does document stay mapped to article too? Should CSL-JSON article import to document or preprint?

Re: archiveID on everything, a concrete example that I've been unsure about: if you have a preprint with an arXiv ID, and then you update metadata and it now is published and has a DOI, we presumably convert that item to journalArticle. Do we keep the archiveID on the item? Throwing it out seems bad, but it also seems a little conceptually fuzzy, since the item no longer really represents that version. arXiv.org obviously keeps the page and lists the DOI, but the canonical source of metadata would be the publisher, and that metadata wouldn't have the arXiv ID.

More practically, do styles know not to use the archiveID for published articles?

adam3smith commented 2 years ago

>  Does document stay mapped to article too? Should CSL-JSON article import to document or preprint

CSL 1.0.2 which we are hoping to release on Dec 1 has document so preprint should map to article and document to document

I'm not sure about the answer to the arxiv questions, but as a data point, arxiv's own bibtex no longer includes the arxiv ID once an item is published in a journal

bwiernik commented 2 years ago

Perhaps converting archiveID to an attached link would be a good way to keep the information but also avoid including the ID in citations to published items?

dstillman commented 2 years ago

That's a good idea.

dstillman commented 2 years ago

But then do we still need archiveID on all item types?

bwiernik commented 2 years ago

A lot of items might have an electronic archive that should be cited instead of/in addition to a URL. APA for example, wants archive and archive IDs to be included when the item is not widely available (e.g., articles, reports, manuscripts, books, documents). Examples given in the manual are ProQuest ID numbers and ERIC ID numbers.

9.30 Database and Archive Sources Database and archive information is seldom needed in reference list entries. The purpose of a reference list entry is to provide readers with the details they will need to perform a search themselves if necessary, not to replicate the path the author of the work personally used. Most periodical and book content is available through a variety of databases or platforms, and different readers will have different methods or points of access. Additionally, URLs from databases or library-provided services usually require a login and/or are session specific, meaning they will not be accessible to most readers and are not suitable to include in a reference list.

Provide database or other online archive information in a reference only when it is necessary for readers to retrieve the cited work from that exact database or archive.

Provide the name of the database or archive when it publishes original, proprietary works available only in that database or archive (e.g., Cochrane Database of Systematic Reviews or UpToDate; see Chapter 10, Examples 13–14). References for these works are similar to journal article references; the name of the database or archive is written in italic title case in the source element, the same as a periodical title.

Provide the name of the database or archive for works of limited circulation, such as

dissertations and theses published in ProQuest Dissertations and Theses Global,

works in a university archive,

manuscripts posted in a preprint archive like PsyArXiv (see Chapter 10, Example 73),

works posted in an institutional or government repository, and

monographs published in ERIC or primary sources published in JSTOR (see Chapter 10, Example 74).

These references are similar to report references; the name of the database or archive is provided in the source element (in title case without italics), the same as a publisher name.

Do not include database information for works obtained from most academic research databases or platforms because works in these resources are widely available. Examples of academic research databases and platforms include APA PsycNET, PsycINFO, Academic Search Complete, CINAHL, Ebook Central, EBSCOhost, Google Scholar, JSTOR (excluding its primary sources collection because these are works of limited distribution), MEDLINE, Nexis Uni, Ovid, ProQuest (excluding its dissertations and theses databases, because dissertations and theses are works of limited circulation), PubMed Central (excluding authors’ final peer-reviewed manuscripts because these are works of limited circulation), ScienceDirect, Scopus, and Web of Science. When citing a work from one of these databases or platforms, do not include the database or platform name in the reference list entry unless the work falls under one of the exceptions.

If you are in doubt as to whether to include database information in a reference, refer to the template for the reference type in question (see Chapter 10).

Finish the database or archive component of the source element with a period, followed by a DOI or URL as applicable (see Sections 9.34–9.36).

dstillman commented 2 years ago

OK, so use the same archiveID field for preprint and journalArticle/others, but move known preprint-server ids to attached links on metadata updating, and translators/people can populate the non-preprint archiveID fields as needed.

The only problem would be if you manually changed the item type from Preprint to Journal Article. If it's the same field, the archiveID value would be preserved and potentially affect citations, which would be different behavior from metadata updating. Or we could override the default behavior and convert to an attached link at that point, to make it the same as during metadata updating, but we wouldn't do that going in the other direction, so it's a little weird.

dstillman commented 2 years ago

This is what I have so far:

{
  "itemType": "preprint",
  "fields": [
    {
      "field": "title"
    },
    {
      "field": "abstractNote"
    },
    {
      "field": "date"
    },
    {
      "field": "repository",
      "baseField": "publisher"
    },
    {
      "field": "place"
    },
    {
      "field": "archiveID"
    },
    {
      "field": "DOI"
    },
    {
      "field": "citationKey"
    },
    {
      "field": "url"
    },
    {
      "field": "accessDate"
    },
    {
      "field": "archive"
    },
    {
      "field": "archiveLocation"
    },
    {
      "field": "shortTitle"
    },
    {
      "field": "language"
    },
    {
      "field": "libraryCatalog"
    },
    {
      "field": "callNumber"
    },
    {
      "field": "rights"
    },
    {
      "field": "extra"
    }
  ],
  "creatorTypes": [
    {
      "creatorType": "author",
      "primary": true
    },
    {
      "creatorType": "contributor"
    },
    {
      "creatorType": "editor"
    },
    {
      "creatorType": "translator"
    },
    {
      "creatorType": "reviewedAuthor"
    }
  ]
}

Some more questions:

Do preprints need "Place" (publisher-place)?
@bwiernik, what would type (mapped to genre) be for, if journal articles (which many/most of these will become) don't have that.
A little weird to have "Repository" (mapped to publisher) and "Archive ID" next to it, when there are existing "Archive" and "Loc. in Archive" fields down below. And I'm a bit confused about how "Archive ID" interacts with "Archive" on other types. Would "Archive" be used for digital archives as well, and you use either "Archive ID" or "Loc. in Archive" depending on electronic vs. physical? But we can't use "Archive" here because we need it to map to publisher?
Should I map archiveID to number for now?

bwiernik commented 2 years ago

The preprint type will encompass things like Working papers (eg, in economics) which are sometimes cited with a place, so I think yes
genre would hold descriptions like "Working paper". It can be dropped if converted to a journal article
That's correct. It's a little funky I agree. In most cases archiveID would pair up with the other Archive variables. Preprints are an unusual case where the archive and the publisher are the same thing.
Hmm, I think so. One concern might be if many styles are written to render number indiscriminately.

@adam3smith Would number generally work as the electronic archive ID, or might items, eg, in ERIC or ProQuest have both, such as a working paper series number and archive ID?

@denismaier @bdarcus What do you think of adding an archive_id variable to CSL to distinguish between physical locations (archive_location) and electronic ones (archive_id)?

adam3smith commented 2 years ago

Agree with Brenton on the above. I think we'll do fine with number - if we want series numbers, we'll use collection-number

Edit: which does mean we'll want series and series number added to the above

dstillman commented 2 years ago

Anyone have an idea for an icon for preprints?

We'll need both a custom one in the new style for iOS/web and something based on famfamfam or Fugue for the desktop client:

http://www.famfamfam.com/lab/icons/silk/previews/index_abc.png https://p.yusukekamiyamane.com/icons/preview/fugue.png

(Could be a combination of icons if necessary.)

dstillman commented 2 years ago

"script" is sort of funny for this, in a Martin-Luther-nailing-theses-to-the-door sort of way. We're using that for Bill in the client, but our custom icon for Bill is the § symbol, so we could repurpose the script concept for this.

For now, I'm going with "receipt", which doesn't make a ton of sense but looks vaguely unfinished — like a piece of paper ripped off a dot matrix printer.

AbeJellinek commented 2 years ago

What about famfamfam's page_white_gear or page_white_go? Or "receipt" but converted to grayscale to match other print-ish types. Something about the blue just feels off to me.

dstillman commented 2 years ago

"receipt" is the top row above. "bill" is the second. I was just saying we could use the bill concept, but we'd definitely do it in white/gray to be closer to the journal article icon.

AbeJellinek commented 2 years ago

Oh, right.

bwiernik commented 2 years ago

Maybe it's just been too many years of seeing the scroll/script used for Bill, but it looks a little weird to me for preprint

For famfamfam, I think both page_white_lightning and page_white_go are interesting and emphasize the rapidity of preprints.

From Fugue, I really like report or report-share. The notebook fringes on the left side of the page feel like a draft or unfinished paper (like receipt but better). The version with the sharing hand emphasizes the sharing/feedback solicitation of preprints/working papers.

adam3smith commented 2 years ago

How about page_white_wrench, because they're (often) still being worked on?

adam3smith commented 2 years ago

You could also pick your four favorites options and make it a Twitter poll, create some preprint buzz

bwiernik commented 2 years ago

Trying out the client on macOS with the new Preprint type. I think the current receipt icon is visually too similar to the Journal Article icon. On the macOS color scheme, I can barely see the fringes at the top and bottom, so the Journal Article and Preprint items look really similar.

A20B3B7F-45FF-42B0-A759-7FF8EEE097FF

dstillman commented 2 years ago

Yes, we'll be changing it. Priority was just getting this out.

bwiernik commented 2 years ago

Cool, just wanted to give some feedback in case that wasn't the plan

adam3smith commented 2 years ago

Starting to work on preprint citations -- I'm not getting Archive ID mapped to CSL number (testing in the style editor in6.0.8-beta.4+1e3959020 ) -- could someone else check whether that's me or a general issue?

dstillman commented 2 years ago

Can you provide a sample minimal style to test that?

adam3smith commented 2 years ago

MWE: https://gist.github.com/adam3smith/786485597971865e2a99687f5401841d Displays patentNumber for patent but [CSL STYLE ERROR: reference with no printed form.] for Preprint

ArchiveID also doesn't show up in CSL JSON from preprints, but I think that's expected? FWIW, I'm testing with https://www.nber.org/papers/w14560 as imported using the NBER translator.

dstillman commented 2 years ago

Sorry about that — didn't update a submodule. Try in the latest beta.

adam3smith commented 2 years ago

Yup, working, thank you!

zotero / zotero-bits

Preprint item type #88