oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.36k stars 748 forks source link

man pages are not cross-referenced #33

Open cnst opened 11 years ago

cnst commented 11 years ago

{OpenGrok does not let you cross reference manual pages. Instead of cross-referencing, it supposedly attempts to perform some form of conversion from troff into html.

No cross referencing is done, no source code can be viewed in the browser (the Download link wants you to save the file first), the Annotate function doesn't work at all.

I'm not sure if this design decision was intentional or simply a proof-of-concept, but it has been proving especially troublesome for the popular open-source BSD flavours and the mdoc / mandoc troff format, where the HTML rendering is simply a complete mess:

http://nxr.netbsd.org/xref/src/share/man/man7/mdoc.7

For my http://BXR.SU project (IPv6-only for now), I have rewritten OpenGrok's troff grammars, with the cross-reference and BSD mdoc format in mind:

http://bxr.su/n/share/man/man7/mdoc.7 (might require IPv6)

I was originally thinking that perhaps it would make sense to contribute my grammars as a separate mdoc format to the main {OpenGrok upstream, but have since contemplated that perhaps it would make more sense to replace the existing non-xref xref grammar to actually be xref as advertised? I'm not sure that many people do come to an xref service to find groff instead; I'd think more users would expect xref when requesting xref.

Any thoughts of how I should go here?

(I'm in the process of converting my modifications from my {OpenGrok 0.11-rc2 fork from an old hg trunk to the recent github master, code will be avail soon, pull requests to follow.)

kahatlen commented 11 years ago

I think there was a discussion about this in the old OpenSolaris bug tracker. The troff xref shows a mix between raw source/markup and formatted output. It would have been cleaner if it separated the two. The xref could show the markup, and then there could be a button you could click to see the rendered output. I think that was the conclusion in the old bug report, but no one stepped up to implement it.

cnst commented 11 years ago

@kahatlen, that makes sense, but I think it's still too complicated for something that's supposed to be a Cross Reference.

I can't talk for OpenSolaris or Minix, or other systems / repositories, but in regards to BSDs, we really don't need OpenGrok to provide rendered output for our mdoc pages. We have man.cgi for that, for every BSD flavour, very readily available.

So, would you accept a contribution changing existing Troff grammars to be xref, or should I separate mdoc from Troff, and split the MAGICS?

https://github.com/OpenGrok/OpenGrok/blob/master/src/org/opensolaris/opengrok/analysis/document/TroffAnalyzerFactory.java

private static final String[] MAGICS = {
    "'\\\"", ".so", ".\\\"", ".TH"
};

mdoc would take .\\\" (is it used outside of mdoc as often as it is within?) and introduce .Dd (mdoc-specific, but usually comes way down, after the licence in the comments, so, not sure if MAGICS would support that), and the rest would go to troff? Frankly, I don't understand why the existing troff rendering is found more useful than an actual source code and the annotation feature, so, if I were to cast a vote, I think the current non-xref troff that takes the place of xref has to go.

vladak commented 11 years ago

@cnst for Solaris repos we store the man pages snapshots (or rather their troff version generated from SGML) together with source code which makes it really handy when searching. I am strong advocate of indexing documentation (design docs, architecture, code review/inspection emails) together with source code since it makes it much easier to get answers to the various 'why' questions. Keeping the man pages in troff format readable is important for our very large (albeit internal) user base.

cnst commented 11 years ago

@vladak, you seem to imply that the troff source code of your manual pages is unreadable. In BSD, man-pages are also stored together with the source code, and it is indeed very handy to have them be avail during the search. What is not helpful is that they are completely and utterly unreadable in OpenGrok, and what is also not helpful is that the source code and syntax cannot be checked, and also it is not possible to search for the mdoc syntax itself (since the troff indexer gets rid of most troff stuff).

So, I'm trying to inquire again: if you're trying to reinvent the wheel, is there a way to distinguish between non-mdoc troff and mdoc? Do you start your pages with .\", too, or do they all start with .TH, when automatically generated from SGML?

http://code.metager.de/source/xref/freebsd/cddl/contrib/opensolaris/cmd/dtrace/dtrace.1 http://bxr.su/f/cddl/contrib/opensolaris/cmd/dtrace/dtrace.1

I guess your auto-generated troff is indeed pretty unreadable. But then, can I ask why don't you have the actual, readable SGML, from which your troff pages are generated?

cnst commented 11 years ago

Ok, I think I got the gist:

All files under https://hg.openindiana.org/upstream/illumos/illumos-gate/file/tip/usr/src/man and at http://src.illumos.org/source/xref/illumos-gate/usr/src/man/, as well as the aforementioned dtrace.1 in FreeBSD, start with '\" te.

So, if needed, I could introduce my mdoc analyser to have a magic of .\" and .Dd, whereas troff would keep '\" and .TH. This would take .\" away from troff, but, from looking at a number of pages under man/ in OpenIndiana, should not affect Solaris (especially if, as you say, all your pages are automatically generated, e.g. must have a uniform header).

P.S. BTW, what is .so in troff used for, that it's one of the magics in the troff analyser?

@kahatlen, @vladak, does it sound ok to you? Frankly, I'm more inclined to simply add support for the "\fB" and stuff to my existing mdoc xref grammar (which does formatting, but keeps all original macros intact), and get rid of the old troff non-xref grammar for good. Still having two separate grammars won't hurt, though.

tarzanek commented 11 years ago

well I vote for 2 different analyzers for now to satisfy both stakeholders I agree we shouldn't show parsed output and source (after all this is a source browser), but then sometimes it's useful to see parsed output (after all we highlight sources and sometimes strip tabs, which is already parsing in a sense so we are in the middle somewhere) some analyzers show a binary download link (which is on the other side) I have prototypes for openoffice and pdf parsers where this will be the same issue again - I will by default show a parsed output to html, which will obviously suck simply because of how pdf and openoffice are designed this asks for some major feature to the UI - which is showing both source and parsed/human readable output ... so until that is done, 2 analyzers should heal the most pain (imho)

cnst commented 11 years ago

OK, I'll do two separate analysers, then. Further looking around, however, has revealed that the current non-cross-referenced output is quite buggy for anything other than Solaris-auto-generated-from-SGML pages, e.g. I'm tending to think that even .TH by itself should also be handled by the new parser, leaving the old parser with only '\" te as the magic.

http://nxr.netbsd.org/xref/src-freebsd/crypto/heimdal/doc/doxyout/krb5/man/man3/krb5.3 http://bxr.su/FreeBSD/crypto/heimdal/doc/doxyout/krb5/man/man3/krb5.3

(I'll be adding support for \fI et al shortly.)

P.S. BTW, the UI change wouldn't really be all that big: simply a new button in the menu; the underlying code changes to support a new type of view would probably be bigger and more involved, as far as I'm concerned (but then I'm not a Java person :).

tarzanek commented 11 years ago

fwiw, I was just contacted by a person handling OCAs that he got your #, they are just confused not knowing what they should do with the applications, so I told them to process them, hopefully you will be up on site soon

cnst commented 11 years ago

LOL. :) Did they get all 4 copies of it? :) Nice job of them to not do anything, not even contact the person sending all these copies and ask for some reference within the org or something! BTW, if they weren't processing it, how come some new names did appear on the web-page rather recently, on two occasions after I've sent my OCA? P.S. BTW, I see the list is once again updated, today, but my name is still not there. :/

tarzanek commented 11 years ago

It's a bit more complicated, but they will post the new ids eventually (they got the oca from you and Ajay). The confusion was because of move from opensolaris.org, ocas are now handled by different teams now.