Closed agitter closed 5 years ago
Agreed. We need a way to search for GitHub repositories that meet a set of criteria. I'm struggling with the online GitHub search to return repos rather than files that meet criteria.
I found a help page that shows how to search for terms in a repository's readme. Searching manubot in:readme
works well!
Nice! 1 false positive (greenelab/manubot
); 11 true positives.
@agitter's search on 2018-03-26 returns 29 repositories. Some interesting manuscripts I hadn't seen before include
Additional instances that appear to currently be in progress are:
@agitter's search on 2018-10-08 returns 40 repositories. Some interesting manuscripts I hadn't seen before include
We now have a catalog with manuscripts written using Manbuot at https://manubot.org/catalog/. The catalog is defined in the https://github.com/manubot/catalog repository, with CI setup to fetch bibliographic details and trigger deployment.
Going forward, we will add new manuscripts to the catalog rather than commenting on them here.
The GitHub search will still be useful to discover new Manubot manuscripts. Here it is, modified to be ordered by "Recently updated": https://github.com/search?o=desc&q=manubot+in%3Areadme&s=updated&type=Repositories
@agitter's search on 2018-10-08 returns 40 repositories
Did we adequately capture all of these in the new catalog @dhimmel ?
Did we adequately capture all of these in the new catalog
No, there are some repositories that are really just stubs without much original content. There are also several repositories that are in early stages where I could see the authors wanting to wait to publicize it (OTOH it is public).
Another option would be to add a catalog field like hidden
. Then we could add every repository to the catalog, while keeping some hidden (at least by default) on manubot.org/catalog. What do you think?
BTW as of 2019-07-10, I get 74 repository results.
Yeah by "adequately" I meant all the ones that are relatively finished and/or worth putting in the catalog.
I'd be down to put in all of them with a special tag and then another checkbox like "show in progress" or "show stubs".
Previously we asked authors before we added their manuscript the to example manuscript list in the Rootstock repository. Do we want to continue contacting authors before adding manuscripts to the catalog?
The manuscripts are public and easy to find via GitHub search, so we aren't leaking any information by adding them to the catalog. Nonetheless, I expect some authors would prefer to not have their early stage manuscripts advertised.
Do we want to continue contacting authors before adding manuscripts to the catalog?
If an author has advertised a manuscript publicly (or has posted a preprint or published the work), then I think we should add it to the catalog.
Still not sure about in-progress works.
Your proposal regarding publicly advertised, preprinted, or published work makes sense to me.
I suggest that we don't include work in progress in the catalog without the authors' permission.
I was curious about current usage statistics and ran the Manubot search today, sorting by recently updated. There are now over 20 repos updated in the last week and about 60 updated in the last month. Many of those are test manuscripts, but it's still a lot of legitimate open writing.
The simple GitHub search has false positives (readmes that discuss Manubot but are not manuscripts) and false negatives (manuscripts that use Manubot but have customized readmes or configurations). Is there a way to search the HTML metadata of the generated manuscript for a unique property like manubot_html_url_versioned
to improve the quality of our automated searches? I'm not sure whether search engines index this metadata.
@vincerubinetti do you know?
It may be nice in the future to produce statistics about how many documents have been authored with Manubot and this rootstock or refer to more examples. @dhimmel has https://github.com/dhimmel/rephetio-manuscript/ and were examples listed in #62.
I haven't been able to think of a non-invasive way to track this. Does anyone else have ideas? Is this worthwhile?