Open brechtm opened 4 years ago
Maybe @chrisjsewell would be interested in moving https://github.com/chrisjsewell/docutils under the roof of this org.
Also, there's https://docutils.readthedocs.io/en/sphinx-docs/ — it seems to be coming from https://github.com/ericholscher/docutils by @ericholscher. I'm curious if he'll let us claim the project for the Chris' repo, for example.
There was talk about this with the docutils maintainers over the past few months. Have they 👍'd this work? Otherwise, I worry that this will just end up as a weird fork that isn't actually maintained or merged by the core team, and that will possibly be an even worse outcome for everyone. Is the goal here to do the work and then hope it will be adopted?
I haven't jumped too far down into Docutils, but could spend a little time helping to "modernize" the reStructuredText docs. I guess to Eric's point, how much of this is intended to be a separate "unofficial" resource vs upstreamed. It sounds like potentially multiple repos, only one of which is the Docutils mirror?
Well I am definitely down for improvements and very much agree with your comments https://github.com/sphinx-doc/sphinx/issues/8039#issuecomment-713015792
Personally, I don't see the current docutils maintainers ever moving away from sourceforge, despite my best efforts:
To an extent I do appreciate the effort of these maintainers, but I do feel like they are "gate-keeping" by not moving away from sourceforge 😬
I've played around with the docutils code a lot. Obviously it works, but I can tell you its no gold standard for programming and there is plenty of room for improvement (particularly for async compliance and removing the need for sphinx to have to monkey patch it).
TBH, with my myst-parser hat on, I would like to see the directive/role code and AST completely decoupled from the syntax parser. There is no reason that it needs to be dependent on the actual source syntax (Markdown > rST 😉)
In terms of the documentation, I have seen the maintainers argue that the docs should represent "vanilla" docutils, i.e. no sphinx. So again I couldn't see them actually deprecating their existing documentation in favour of anything new. I agree with you though that this is very detrimental to the user experience (and its such a pain to always have to add docutils classes to nitpick ignore for intersphinx). I think at least maintaining separate documentation would be very worthwhile, even if it is not "officially" supported.
I'm happy to move https://github.com/chrisjsewell/docutils wherever. I've already got maintainer priveledges to https://github.com/docutils/docutils from @ericholscher, with the intention to move it there. The sticking point was that its easy to create a "static" mirror of docutils from sourceforge, but ideally it should be dynamically updated (probably via a GH action cron job) which was a trickier prospect that I haven't yet had the time to figure out.
comment by @ericholscher:
There was talk about this with the docutils maintainers over the past few months. Have they 👍'd this work? Otherwise, I worry that this will just end up as a weird fork that isn't actually maintained or merged by the core team, and that will possibly be an even worse outcome for everyone. Is the goal here to do the work and then hope it will be adopted?
Moving docutils over to GitHub (or GitLab) has been discussed several times over the past years. The docutils team is strongly opposed to it, and I think we should respect that decision. After all, there is no certainty that there will be more contributions to the project after such a move.
We can however set up a mirror here on GitHub to collaborate on patches using PRs (helped by CI to run the test suite). Once a PR is deemed ready (by us), we can create a new patch ticket on SourceForge and link to the PR and patch. Feedback on the patch will be handled on SourceForge, but at least Subversion is out of the equation. I feel this is a good middle ground between having the convenience of GitHub PR's while respecting the docutils team's preferences. The docutils maintainers are of course free to make use of GitHub PR reviewing feature, if they wish.
I'm not sure what to do with bug reports. We could allow people to create bug reports using GitHub issues, but I would make it clear that this is an independent project and provide a link to the docutils issue tracker on SourceForge. Should we create corresponding tickets on SF? Perhaps we can simply run SourceForge's GitHub Importer weekly? (incremental import doesn't seem possible)
This would be independent from the docutils project and would thus not suffer from any SF/GitHub interop issues. As a start, this could simply be a collection of helper functions, e.g. to construct a table, or convert_rst_to_nodes()
from sphinx-doc/sphinx#8039.
I'm not sure on the details, but I basically want a go-to page for reStructuredText. Perhaps something akin to the MultiMarkdown website, providing an introduction to reStructuredText along with links to resources. Similar to the reStructuredText documentation on the docutils website, but in a modern, welcoming package. Inspiration can be found here:
Perhaps it's not even necessary to have any tutorial or documentation on there, and simply link to the available resources:
Additionally, I think a discussion forum could be useful. I have the feeling most people don't go to the trouble of subscribing to mailing lists anymore.
After all, there is no certainty that there will be more contributions to the project after such a move.
My understanding from the previous discussions was that I'm not 100% convinced they wish to have more contributions.
@brechtm mind if I request enabling the Discussions feature on this repo from GitHub? It'd probably fit the purpose of it better.
re: homepage -- the fact this many external explanations on "how to use RST" exist means that the official docs are hard to consume and so I'd say that just linking there would be as confusing. I'd very much like to see the official docs on RTD.
re: PR patches -- sounds good too. In fact, it's easy to automate sending them to ML on approve/merge, for example. Another idea would be to have 2 repos (one would be a fork of another). The main repo in the org would accept PRs and the fork would get SCM autoupdates from SF that would be merged back to the main repo occasionally. This way updating from the upstream and merging-in the good stuff could be decoupled. (It's distributed after all: https://web.archive.org/web/20181029051129/https://dpc.pw/blog/2017/08/youre-using-git-wrong/)
re: wrapper lib -- :+1:
@brechtm Thanks for the explanation -- it seems like a reasonable approach. I think a mirror with backported patches is a reasonable approach to get started in terms of a docutils & GH combo. A wrapper/helper lib also makes sense. There's a lot of things that are harder than they should be in both Sphinx & docutils in terms of extending them.
I'm 💯 on a site for RST, I think that would do a lot of good. I'd say the best inspiration I have for this is Asciidoctor:
I think having a good reference/user guide for this would be useful for a lot of people. Though the docutils site already has a lot of this reference content (eg. directives docs: https://docutils.sourceforge.io/docs/ref/rst/directives.html) -- so I do worry about how we're differentiating and backporting this content.
Sphinx also has a lot of additional features defined which they document. I think one of the largest issues I run into is the "split brain" situation where Sphinx only documents its features, then links out to docutils docs for the rest. It seems that each project wants to keep it the way it is, and having us try to maintain a merged set of these also feels ripe for staleness if the maintainers of the core projects aren't involved. I also worry about creating another "split brain" situation where we have external RST/docutils docs, and the official project has its own, and they aren't coordinated.
I guess I'm all for the effort to build better resources, but do worry about the long-term maintainability when removed from the maintainers of the core projects. It seems like a project worth pursuing though, and I think it can become a canonical resource if done well.
FYI, this is what I used for the issue transfer (using the sourceforge and GitHub REST APIs):
# https://anypoint.mulesoft.com/apiplatform/sourceforge/#/portals/organizations/98f11a03-7ec0-4a34-b001-c1ca0e0c45b1/apis/32951/versions/34322
# https://gist.github.com/hmenke/05e5754d188f1367bbcaa62ebc57397e
import html
import re
import requests
import time
from jinja2 import Template
def get_sf_ticket_numbers(category):
response = requests.get(f'https://sourceforge.net/rest/p/docutils/{category}?limit=500')
data = response.json()
return sorted([t["ticket_num"] for t in data["tickets"]])
def get_sf_ticket(category, number, discussion=True):
response = requests.get(f'https://sourceforge.net/rest/p/docutils/{category}/{number}')
data = response.json()["ticket"]
main = {
"title": html.unescape(data["summary"]),
"description": html.unescape("\n".join(data["description"].splitlines())),
"author": data["reported_by"],
"date": data["created_date"], # e.g. 2002-09-04 15:51:11
"assigned": data["assigned_to"],
"status": data["status"],
"priority": data["custom_fields"]["_priority"], # number 1-5
"labels": data["labels"], # list of str
"attachments": [a["url"] for a in data["attachments"]],
# "discussion_id": data["discussion_thread"]["discussion_id"],
# "discussion_url": data["discussion_thread_url"],
# related_artifacts
}
if discussion:
response = requests.get(data["discussion_thread_url"] + "?limit=500")
main["discussion"] = [
{
"author": p["author"],
"date": p["timestamp"], # e.g. 2002-09-04 15:51:11
"title": html.unescape(p["subject"]),
"body": html.unescape("\n".join(p["text"].splitlines())),
"attachments": [a["url"] for a in p["attachments"]]
}
for p in response.json()["thread"]["posts"]
]
return main
def sf_to_gh_issue(category, ticket_num, data):
title = f"{data['title']} [SF:{category}:{ticket_num}]"
template = Template("""
author: {{ author }}
created: {{ date }}
assigned: {{ assigned }}
SF_url: https://sourceforge.net/p/docutils/{{ category }}/{{ ticket_num }}
{% if attachments %}
attachments:
{% for attach in attachments %}
- {{ attach }}
{% endfor %}
{% endif %}
{{ description }}
{% for post in discussion %}
---
commenter: {{ post.author }}
posted: {{ post.date }}
title: {{ post.title }}
{% if post.attachments %}
attachments:
{% for attach in post.attachments %}
- {{ attach }}
{% endfor %}
{% endif %}
{{ post.body }}
{% endfor %}
""")
body = template.render(category=category, ticket_num=ticket_num, **data)
labels = [category, data["status"], f"priority-{data['priority']}"] + data["labels"]
return {
"title": title,
"body": body,
"labels": labels
# "milestone"
# "assignees"
}
def get_gh_graphql(query, token = None):
headers = {
"Authorization": f"Bearer {token or os.environ['GITHUB_AUTH_TOKEN']}",
"Content-Type": "application/json",
"Accept": "application/json",
}
request = requests.post(
"https://api.github.com/graphql", json={"query": query}, headers=headers
)
if request.status_code == 200:
return request.json()
else:
raise Exception(
"Query failed to run by returning code of {}. {}".format(
request.status_code, query
)
)
def _extract_sf_ticket_number(title, sf_category):
results = re.findall(r"\[SF\:" + sf_category + r":(\d+)\]", title)
if len(results) != 1:
raise ValueError("a single SF issue was not found: {}".format(title))
return int(results[0])
ISSUE_TEMPLATE = """
query {{
search(query: "repo:{org}/{repo} is:issue \\"[SF:{sf_category}:\\" in:title", type: ISSUE, first: 100 {after}) {{
issueCount
pageInfo{{
hasNextPage
endCursor
}}
edges {{
node {{
... on Issue{{
id
number
title
}}
}}
}}
}}
}}
"""
def get_gh_sf_issues(org, repo, sf_category, token = None):
"""Find all GitHub issues that relate to a SourForge ticket."""
more_pages = True
requests_num = 0 # avoid infinite loops
cursor = None
issues = {}
while more_pages and requests_num < 99:
query = ISSUE_TEMPLATE.format(org=org, repo=repo, sf_category=sf_category, after=f', after: "{cursor}"' if cursor else "")
data = get_gh_graphql(query, token)["data"]["search"]
cursor = data["pageInfo"]["endCursor"]
more_pages = data["pageInfo"]["hasNextPage"]
requests_num += 1
issues.update({_extract_sf_ticket_number(e["node"]["title"], sf_category): e["node"] for e in data["edges"]})
return issues
def post_gh_issue(org, repo, data, token = None):
headers = {
"Authorization": f"Bearer {token or os.environ['GITHUB_AUTH_TOKEN']}",
"Content-Type": "application/json",
"Accept": "application/vnd.github.v3+json",
}
request = requests.post(
f"https://api.github.com/repos/{org}/{repo}/issues", json=data, headers=headers
)
if request.status_code == 201:
return request.json()
else:
raise Exception(
"Query failed to run by returning code of {}. {}".format(
request.status_code, query
)
)
def close_gh_issue(org, repo, issue_number, token = None):
headers = {
"Authorization": f"Bearer {token or os.environ['GITHUB_AUTH_TOKEN']}",
"Content-Type": "application/json",
"Accept": "application/vnd.github.v3+json",
}
request = requests.post(
f"https://api.github.com/repos/{org}/{repo}/issues/{issue_number}", json={"state": "closed"}, headers=headers
)
if request.status_code == 200:
return request.json()
else:
raise Exception(
"Query failed to run by returning code of {}. {}".format(
request.status_code, query
)
)
def perform_migration(org, repo, sf_category, gh_token = None, limit=10, interval=1):
# get all previously migrated issues
print(f"retrieving previously migrated {sf_category} issues: ", end="")
migrated = get_gh_sf_issues("chrisjsewell", "docutils", sf_category, _token)
print(len(migrated))
# get list of SF tickets
print(f"retrieving SF {sf_category} ticket numbers: ", end="")
ticket_nums = get_sf_ticket_numbers(sf_category)
print(len(ticket_nums))
# iterate through issues
count = 0
for ticket_num in ticket_nums:
if limit is not None and count >= limit:
break
if ticket_num in migrated:
continue
count += 1
print("migrating {} ticket {}".format(sf_category, ticket_num))
# get SF ticket data
sf_data = get_sf_ticket(sf_category, ticket_num)
# convert SF ticket data to GH issue data
gh_data = sf_to_gh_issue(sf_category, ticket_num, sf_data)
# post issue
issue_data = post_gh_issue(org, repo, gh_data, gh_token)
# if SF ticket was closed, close GH issue
if sf_data["status"].startswith("close"):
print("closing issue {}".format(ticket_num))
close_gh_issue(org, repo, issue_data["number"], gh_token)
if interval:
time.sleep(interval)
if __name__ == "__main__":
_token = "xxxx"
_org = "chrisjsewell"
_repo = "docutils"
for _category in ["bugs", "feature-requests", "patches", "support-requests"]:
perform_migration(_org, _repo, _category, _token, 100)
@ewjoachim
My understanding from the previous discussions was that I'm not 100% convinced they wish to have more contributions.
We should definitely reach out to the docutils team again as soon as we've established what our plans are and ask them for feedback.
@webknjaz
mind if I request enabling the Discussions feature on this repo from GitHub? It'd probably fit the purpose of it better.
I wouldn't mind. I've made you an owner of this organization, just in case that's needed. However, these issues might do just fine for now...
Another idea would be to have 2 repos (one would be a fork of another). The main repo in the org would accept PRs and the fork would get SCM autoupdates from SF that would be merged back to the main repo occasionally.
We can indeed mirror the Subversion repo in an automated way. If we mirror all SVN branches under an upstream/ namespace (for example), we should be able to do everything in one repository.
@ericholscher
I think having a good reference/user guide for this would be useful for a lot of people. Though the docutils site already has a lot of this reference content (eg. directives docs: https://docutils.sourceforge.io/docs/ref/rst/directives.html) -- so I do worry about how we're differentiating and backporting this content.
That's a good point. We could include the docutils repository as a submodule in our website repository and include individual rst files from it in a Sphinx project. That will limit the changes we can make to those sections, of course. Plus any changes we do make will need to be approved by the docutils maintainers, so I'm not sure how workable this would be.
Sphinx also has a lot of additional features defined which they document. I think one of the largest issues I run into is the "split brain" situation where Sphinx only documents its features, then links out to docutils docs for the rest. It seems that each project wants to keep it the way it is, and having us try to maintain a merged set of these also feels ripe for staleness if the maintainers of the core projects aren't involved. I also worry about creating another "split brain" situation where we have external RST/docutils docs, and the official project has its own, and they aren't coordinated.
I'm not sure it is really an issue that Sphinx links out to the docutils documentation. It would already be a big improvement if the Sphinx could link out to Sphinx-ized docutils documentation (intersphinx).
I personally think the website probably shouldn't be the main documentation resource for Sphinx. Instead, it should represent the full reStructuredText ecosystem. Granted, Sphinx is a major part of the latter, but I would still just reference it as one of the tools building on docutils and reStructuredText. We may be on different wavelengths here though. I would be interested to hear the input from the others on this.
@chrisjsewell
FYI, this is what I used for the issue transfer (using the sourceforge and GitHub REST APIs)
Thanks for sharing! We might even be able to set up two-way sync between SF tickets and GitHub issues. But that may not be the best use of our time...
Please sign up for email notifications on the creation of new tickets on this repo. It will be good to split out the topics to separate issues. "Watching" at the top-right of the page.
We can indeed mirror the Subversion repo in an automated way. If we mirror all SVN branches under an upstream/ namespace (for example), we should be able to do everything in one repository.
Good point: this also addresses the problem that the fork cannot be in the same org.
I wouldn't mind. I've made you an owner of this organization, just in case that's needed. However, these issues might do just fine for now...
Thanks. This shouldn't be necessary, though. FWIW discussions are in private beta and so they are enabled with some feature flags per repo. I think they'd represent the decisions better than issues.
Please sign up for email notifications on the creation of new tickets on this repo. It will be good to split out the topics to separate issues. "Watching" at the top-right of the page.
@brechtm we should try https://github.com/reStructuredText/startup/discussions for that
For what it is worth for an unrelated (private) project I've started to rewrite my own RST parser even if incomplete (for now), and I don't think if I'm the only user that I'm going to support everything (for example I think that tables should be a directive)
It's one of my first parser so it's ugly and have a few requirement that you may not want for a stricter rst parser so I'm not ready to share it yet. I'm mostly trying to parse numpy docstring, which have a slight variant of RST in some sections so I have to have a couple of weird design decision.
1) I want a CST more than an AST as I'd like to be able to reformat existing RST. 2) I'd like to be able to parse and get an CST/AST without having to pre-register existing directives. 3) as numpydoc has weird syntax in some sections (Parameters, Raises, See Also)... I went for a multiple pass parser; so that specific section – base on header names – can be treated independently, and/or so that you can embed a subset of rst in another object. 4) I do not care about performance/speed of parsing/memory usage.
I will likely end up pulling the rst parser in its own separate package, but not yet; if you are interested or are thinking in starting writing your own, let me know I can try to make extracting the parser a higher priority, and would love collaboration on this.
I want a CST more than an AST as I'd like to be able to reformat existing RST.
Maybe not what you are looking for, but with tree-sitter you can generate a CST, I wrote one for rst (still wip) https://github.com/stsewd/tree-sitter-rst, I'm using it mainly for syntax highlighting in neovim.
(for example I think that tables should be a directive)
I'm thinking on not supporting tables on my parser as well :D haha
@Carreau with my MyST bias lol, I'd say if you are going to go to all the trouble, perhaps you would consider https://github.com/executablebooks/MyST-Parser/issues/228 😬 Working with markdown-it is a heck of a lot easier than working with docutils 😉
If not, you might want to check out https://github.com/executablebooks/rst-to-myst, where I have re-worked aspects of the RST parser to achieve a "loseless" AST
Maybe not what you are looking for, but with tree-sitter you can generate a CST, I wrote one for rst (still wip) stsewd/tree-sitter-rst, I'm using it mainly for syntax highlighting in neovim.
I might have a look at this; I would have to see how hard it is to "unparse" sections that do not have rst syntax.
If not, you might want to check out executablebooks/rst-to-myst, where I have re-worked aspects of the RST parser to achieve a "loseless" AST
This give you docutils nodes right ? as far as I can tell. I'm worried about still using docutils at that point.
This issue is rather broad in scope, so I suggest we start a new discussion for each individual topic.
In an attempt to lead by example, I've replied to @Carreau's comment here: #5.
Just because I got directed here by @chrisjsewell , I'm +1 to this:
I think at least maintaining separate documentation would be very worthwhile, even if it is not "officially" supported.
I'm not sure it is really an issue that Sphinx links out to the docutils documentation. It would already be a big improvement if the Sphinx could link out to Sphinx-ized docutils documentation (intersphinx).
At least it would reduce eye strain on the folks that look at it the most... which is probably the people in this thread 😄
I did notice that the official docutils docs were recently restyled. While still not super modern, I think they're much more presentable now and there's also a handy table of contents on the left. 👍 to the docutils team!
Let's use this issue to discuss how to go ahead with this. Here are some ideas to kick things off:
My main concern at this point is whether we can gather enough people to make this a success. I can only spend a couple of hours a week on this myself, for example.