yangkky / Machine-learning-for-proteins

Listing of papers about machine learning for proteins.
GNU General Public License v3.0
1.54k stars 203 forks source link

Create full formatting spec #16

Open yangkky opened 5 years ago

agitter commented 5 years ago

@yangkky would you want to try to automate some of the citation metadata extraction and formatting? I started thinking about how our Manubot tool might be able to take a list of conference paper URLs, arXiv ids, journal DOIs, etc. and generate the formatted reference list for collaborative curation projects like this (https://github.com/manubot/rootstock/issues/223). The goal would be to make it easier to maintain and contribute to.

yangkky commented 5 years ago

Yes, that would be awesome!

agitter commented 5 years ago

There are a few options for setting this up. All of them would involve listing references in this repository, using Manubot (with pandoc and CSL) to extract and format the metadata, and deploying the formatted list.

Option 1: Full manuscript This is what Manubot was primarily designed for. A simple manuscript would have lists or tables of references and be deployed as HTML and PDF to GitHub Pages. https://greenelab.github.io/meta-review/ is a complex example, but it would be much more sparse if only tracking categorized references.

Option 2: Keep the references in the readme Manubot doesn't directly support this, but I made a quick proof of concept at https://github.com/agitter/manubot-awesome-list Like this repository currently does, everything would be displayed in the master branch readme. However, the references would all be extracted from lists of DOIs, URLs, etc. instead of manually maintained. This isn't ready to go right now but could be made more robust.

Option 3: Custom GitHub pages This isn't worth the effort, but it is possible to use the Manubot API to get reference metadata and display references in a customized way. The Manubot catalog does that.

yangkky commented 5 years ago

In principle, I like option 1 a lot, especially as I'd eventually like to add some paper summaries. Under this option, the final product would have it's own webpage, backed by the github repo, right?

agitter commented 5 years ago

Yes, with option 1 there is a GitHub repository that contains Markdown files with references and discussions of some or all references. Those are built into HTML and PDF documents on the gh-pages branch of the repository.

You can edit the demo at https://github.com/manubot/try-manubot/ to see how option 1 works. The files in the content subdirectory are turned into https://manubot.github.io/try-manubot/

yangkky commented 5 years ago

Great! I'll give this a try when I have some time!

agitter commented 4 years ago

I've found this repo to be very useful, so I'm returning to this issue to see if there is a way to reduce the maintenance burden. I prototyped a GitHub Actions workflow that will automatically extract the citation information from the title of a new issue. The idea is that you and other contributors could quickly make new issues with relevant papers and periodically make a batch update to the readme categorizing those new papers. The manual step of copying and formatting the reference would be eliminated.

You can see what this looks like in https://github.com/agitter/manubot-awesome-list/issues/7 or test it by opening a new issue in that repo. Title field can be a URL to a paper, a DOI, arXiv id, PMID, or any other identifier that Manubot supports. The action only runs on issues with the reference label.

I also made a Citation Style Language file that modifies the Manubot default to approximately match the reference style in this repo.

Let me know if this seems useful. I could make a pull request to enable the issue processing here.

yangkky commented 4 years ago

Hi Anthony,

Sorry for the slow response. This sounds great! If you make a pr I'll definitely review it.

On Thu, Oct 29, 2020, 5:19 PM Anthony Gitter notifications@github.com wrote:

I've found this repo to be very useful, so I'm returning to this issue to see if there is a way to reduce the maintenance burden. I prototyped a GitHub Actions workflow that will automatically extract the citation information from the title of a new issue. The idea is that you and other contributors could quickly make new issues with relevant papers and periodically make a batch update to the readme categorizing those new papers. The manual step of copying and formatting the reference would be eliminated.

You can see what this looks like in agitter/manubot-awesome-list#7 https://github.com/agitter/manubot-awesome-list/issues/7 or test it by opening a new issue in that repo. Title field can be a URL to a paper, a DOI, arXiv id, PMID, or any other identifier that Manubot supports https://github.com/manubot/rootstock/blob/master/USAGE.md#citations. The action only runs on issues with the reference label.

I also made a Citation Style Language file https://github.com/agitter/manubot-awesome-list/blob/master/style.csl that modifies the Manubot default to approximately match the reference style in this repo.

Let me know if this seems useful. I could make a pull request to enable the issue processing here.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yangkky/Machine-learning-for-proteins/issues/16#issuecomment-719028483, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEMNWDCJ5YRIIZSCNG2X3TSNHL5BANCNFSM4HZVGHAA .